13. 08. 2017.

How to have a higher chance of success when restoring a big MySQL database

Restoring a MySQL database is fast and easy when you just copy files in datadir when the server is shutdown, or if you use Percona xtrabackup.

But if you for some reason (AWS RDS) only have MySQL protocol available for backup, you usually can have a compressed mysqldump, that is quite slow to restore, not because of the compression or because the decompressed version is a text file that needs to be parsed, but because MySQL is slow to push it through it's disk pipeline, and because it needs to build data indexes while doing a restore.

I've spent multiple days babysitting the process of restoring a 7GB gzip compressed MySQL dump file, and these are results and tips that could help you save some time.


So, make sure that:
- you have enough IO available: For restoring a 66 GB datadir 315.6 GB was written to the drive (as measured with iostat), with a tuned MySQL configuration. For a DB of this size a mechanical drive doesn't cut it, and restore will take multiple days. Use a good SDD.

- your database TRIGGERS all have BEGIN/END statements (even though you can create them without and even thought the bug was supposed to be fixed https://bugs.mysql.com/bug.php?id=16878),  it fails on restore, with all versions of MySQL 5.7/5.6 i tried

- you start with a really empty database in your datadir - DB I worked with had inconsistent data types on a foreign key, when the dependent table with an inconsistent key already exists MySQL will report a foreign key error (MariaDB will be more informative), but if it doesn't it will happily restore the database

- your max_allowed_packet conf value is big enough, or you'll get a MySQL server has gone away message from your client while restoring.

- your innodb_log_file_size is big enough (https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_log_file_size) - if you have large BLOB values in your DB, restore will fail if the value is lower than 10% of your blob field. This setting is important for quick restore too

- you have log-bin turned off in order to minimize your chance to run out of drive space and save IO (log-bin=Off doesn't mean that it's disabled just that the log bin files start with Off, the documentation can be confusing here :)  What worked for me is having all log-bin lines in the mysqld config section commented out


Finally, if you want it to finish quickly, use the fastest SSD you have available, and consider tuning MySQL configuration a bit. I'm also considering using a ramdisk, because it would help both with restore speed and when you need to do some DB transformations. MySQL defaults are not reasonable, especially for innodb_log_file_size, max_allowed_packet.

I used excellent pv for figuring out if the restore process will finish in reasonable time

pv db_dump.gz |gunzip -c |mysql -uroot database_name



Here's a full list of my MySQLd configuration variables that worked for me on my dev laptop

#my dev laptop is low on memory, for prod server you would use a lot more
innodb_buffer_pool_size=512M
innodb_additional_mem_pool_size=256M
innodb_log_buffer_size=256M
innodb_log_file_size=512M
max_allowed_packet=64M

#for saving disk IO, dont use on prod
innodb_flush_log_at_trx_commit = 2
innodb_flush_method=O_DIRECT_NO_FSYNC
skip-innodb_doublewrite




30. 07. 2017.

Some thoughts on (Modern) PHP



I have experience in both Java and PHP. Java mostly for traditional desktop apps, embedded UIs and PHP for websites.


Custom PHP frameworks I've built or helped build took into account the way PHP is executed: you are stateless and need to setup everything on every request (runtime is fast to start with FPM and opcode cache). Namespaces based cheap autoloader worked great. We used singletons for getting the configuration and connections to DBs. There was almost no setup code that needed to be run every time other than loading .ini based configuration and connecting to the DB. My webapps responded under 20ms (DB and other services like sphinx included), and I could get it to respond in 1 ms for things where we needed to be quick and didn't have to output HTML with Forms. It was really small and you could read the whole framework code in 1-2 hours. It worked with SQL in a reasonable way. You didn't have to write your SQL for simple CRUD, but for larger things involving joining multiple tables and more complex expressions we wrote native SQL. Caching was done thoughtfully, using  APC user cache (SHM with zero copy). It just felt nice.


I switched jobs recently, and started with Symfony 3. The thing felt like some Java framework, but poorly documented and harder to use than it should be. It had lots and lots of setup code done before handling every request. There's a whole DI framework with it's load of setup code for every component. And you have to do setup even though you don't use the component in that particular request. There are ways of doing setup lazily, but you still waste time to wire that up. Framework overhead can be 30-100ms. Other modern PHP frameworks often have similar overhead. I know that there's PHP-PM, to save some of that work that isn't really $_REQUEST specific, but it doesn't seem to be used much for production. And using Silex (deprecated by Symfony 4?) is really not that different, you still either reuse Symfony components, or rewrite them, but with similar "best practices" that are inspired by Java.


Regarding persistence, Doctrine and it's verbosity feels very ugly to me. I'd much rather use SQL syntax for defining relationships than bunch of PHP with special syntax comments or xml or yaml. And also use real SQL for complex queries.



Everybody is using type hints wherever they can, and it feels as verbose as Java, but without compile time type safety, and you can't really put type information everywhere (class members for example).


So you are almost using a type safe language, but can't get performance or compile time benefits, because inevitably, you'll have to use some dynamic typing or other dynamic language features.

Even though PHP runtime has made great progress with 7.x (it's probably the fastest interpreted language, and it's great it has reference counted garbage collection), it feels like language is struggling to find it's identity, with it taking a lot from Java and still coping with ugly legacy ($, having to use $this inside a class function, php.ini, features for supporting templating even though it's rarely used as a templating language in modern frameworks, https://3v4l.org/mcpi7).


Learning Python and Flask (as an example) was much more enjoyable than switching from a nimble custom PHP framework to Symfony. Using NodeJS and minimalistic components to build my own framework was also nice. I'd love to try GoLang, Swift or Rust in the backend too.


And there's the thing that most of the PHP frameworks try too hard to be full stack, when nowdays it's not rare you only do REST APIs on the backend. So there's a lot of infrastructure and assumptions in place for rendering HTML that you really don't need to use and that gets in the way when learning the framework and is wasteful when the code is executing.


I'd argue you can write fast, simple and maintainable PHP, by using state PHP runtime has setup for you ($_POST, $_GET, $_SERVER etc), namespaces and a namespace based autoloader, trying to use pure functions when you can (using static classes shouldn't be a sin - use it to split your code in sensible parts), and using general good practices for writing readable and maintainable code (avoid long functions, huge classes, too much block nesting, decoupling, naming things in a good way). With some coding conventions you can write a decent and productive framework quickly, but you could do that with a nicer language too, so what's the point?

(Thankfully, I'm not using Symfony on my new job, and Yii2 does suffer from some issues too, it at least feels better for now)




11. 07. 2017.

Open Source is not a business model

A common theme nowadays in the open source developers' circles is that you can't live writing open source. There are sad accounts of people abandoning their (popular) open source libraries, frameworks or programs, because they suck too much of author's time and with little or no financial gain. Others try their luck at Kickstart or Indiegogo campaigns for funding a few milestones of their project,or set up a donation system via Gratipay, Flattr or Patreon.

This conflates several different approaches to making money off of open source, each of which requires a different way of thinking about how the money is related to the work.

One way to get paid writing open source is to work (or consult) for a company heavily involved in open source. One such example is Collabora, one of the world’s largest open source consultancies[0]. To a lesser extent, if you’re using a lot of open source software in your day job, you can try to convince your boss to allocate some hours towards contributing back[1]. A great thing about this approach is that you don’t need to worry about making money this way — your employer does that.

All of the other approaches require you, the developer, to actively work on getting paid. Open source, by itself, is not a business model. You can build one around it, but you must work (extra) for it.

One approach is Open Core: have the base project be open source, but then create additional proprietary products around it (or versions of the projects) and sell them. There’s a number of examples for this approach, for example, Nginx.

A similar approach is different licensing schemes for commercial and open source usage (e.g. GPL + proprietary license for customers that can’t/won’t use GPL-licensed software). While this can work, it depends on having enough customers needing the proprietary license. Projects using this scheme (for example, QT) have trouble attracting contributors since they have to sign away their copyright.

Another approach is open source consulting. The project is entirely open source, and the revenue is brought in by charging for customisation, integration and support. If you’re an author of a popular piece of software and constantly get feature requests (or bug reports) from people demanding they need it - ask them to pay for it and voila, you’re an open source consultant. A nice thing about this approach is that you even don’t need to be the primary author of the open source product, you just need to be an expert on it.

Is there a way to just write open source and get paid? Yes — grants and fundraising. Grants, such as Mozilla’s or Google’s, or Kickstarter/Indiegogo campaigns (Schema migrations for Django or Improved PostgreSQL support for Django, to name two I’ve backed) allow recipients to focus on the open source project without needing to build a company around it.But they also require work: applying for the grant, preparing and promoting the campaign (it also helps if you’re already recognised in your community, so that there’s trust that you can deliver on the promise). Failure to do this less appealing work will result in failure to attract grants or donations, and you’re back to square one.

An approach that does not work is chugging along your open source development and just pasting a Gratipay, Flattr or Patreon button[2] on your page. You may fund your coffee-drinking habits that way, but you’re not likely to be able to live off of it. A day may come[3] when this becomes a viable model, but currently it is not.

Hoping that “if you build it, they will pay” is as disastrous as “if you build it, they will come”. You can make money off of open source, but you need to think it through, devise a business model that suits you (and that you like) best, and than execute on it.


[0] I used to contract with Collabora, they’re an awesome bunch and have a number of job openings,many of them remote.
[1] This is what we do at Good Code, a company I run, where we’ve got several Django contributors and encourage community involvement.
[2] I’m not disparaging any of these. I do believe they’re great attempts (and Patreon works really well for some types of projects,for example The Great War).
[3] If it ever becomes a reality, Basic Income would be a great thing for open source. I’m not holding my breath, though. A refined Gratipay/Flattr/Patreon/Kickstarter/Charitystorm model that works is more likely.

Facts and Beliefs

Software must have a purpose[0].

This is both self-evident and controversial. If we're expending serious effort to build it, it better have a purpose, but on the other hand, the purpose (the why) and the behaviour (the what) are often conflated.

Purpose is like a mission statement - a concise description of why something is being done that can be used by anyone to double-check that whatever they're working on is actually relevant and beneficial.

Purpose is not a list of features or requirements. What the software does should always be secondary to why it is needed in the first place. But often, teams miss the forest from the trees and think in terms of functionality - features and requirements.

New vs Improved

The purpose of a software project is either to improve an existing process[1] or to facilitate something that wasn't possible before.

Building software to improve an existing process is often considered boring and uninspiring, but there is a positive side to it - the problems with the current process are usually known and can be analysed and good, reliable data obtained on how to improve it.

When building something new, the people thinking about what needs to be built need to guess. While their assumptions may be close to the target and sometimes they may get lucky, there is simply no way to exactly know what's going to be needed.

Facts vs Beliefs

For the projects in the first category, improving existing system, the designers can rely on hard facts to at least some degree. The important thing here is knowing what exactly is a real fact, and what's a guess. Too often, the designers, the client[2] and the developers add in a number of assumptions, guesses, and "we can also do this easily so why not"-s.

These beliefs are still just guesses and can, and often do, turn out to be wrong. But throughout the software development project, they are treated as reverently as facts.

In some cases, even the real facts are tainted by the analysts filtering them (possibly without even knowing about it) through their own set of preconcieved ideas.

I firmly believe this is one of major reasons for the sad state of affairs where the majority of software projects fail.

A good step in the wrong direction

Agile software development methodologies recognize this, but their workaround is to treat everything as beliefs. Thus they are optimized for changing direction quickly when beliefs are invalidated or new ones introduced. In practice, many organisations adopt Agile in name but not in spirit[3].

In those that do, the pressure is often to focuss at the short term development performance (velocity as a target) and churning out user stories by the week, making it easy to forget about the big picture.

Another problem with Agile in practice is that the feedback loop, supposed to be as small as possible, often isn't. In practice, the feedback is provided by the project managers or QA testing the implemented user stories. The software doesn't actually get used, and beliefs really tested, until much later, turning these projects into waterfalls in Agile's clothing[4].

From facts to requirements

The more facts about the problem being solved, the better, provided that they are analysed correctly, without bias, and that they are precisely and unambigously stated.

For example, a fact might be that using a current software, an operator fills the (imaginary) FT1P form in 10 minutes on average, costing the company $10,000 a month in aggregate.

Similar statement, that it's hard to fill out the FT1P form so the operators are disgruntled, can also be taken as a fact if it's a strong opinion of everyone touching FT1P, and not just a complaint from one person. If so, further analysis can be done to see exactly what about FT1P makes the form hard to fill.

Facts help set constraints on the system. The above facts can produce requirements such as "keyboard shortcuts for moving around the form", "implement autocomplete on fields Foo and Bar" or "the maximum response time for an autocomplete must be under 500ms".

When producing requirements, either from facts or from beliefs, it makes sense to explain why they're needed, as this context is useful information to the system designers and developers. In the above example the explanation could be that these are needed to make it easier for operators to fill out the form in under 5 minutes, which is expected to save the company $5,000 a month.

Avoiding disaster with beliefs

The first step in avoiding in a trap is knowing it exists. Likewise, the first step in avoiding disaster holding a belief is knowing it can turn out to be false, and preparing for it.

The best way to deal with it is to figure out a way to prove or falsify the belief as quickly as possible. In theory, sprint feedback in Agile methodologies should do that. In Lean Startup movement, a Minimum Viable Product (MVP) should do that. In science, we call that an experiment. In engineering, it's called building and testing the prototype.

The main thing here is that the prototype should be as cheap and built as quickly as possible, while still testing it in real-world conditions. For much of the software, it means giving to the real end-users and seeing what they do with it, not asking your boss to try it out and see if she approves.

If this is not possible, one strategy could be to keep both options open (organise the software so it can be easily adapted as the belief is resolved), if it's expected that the complexity cost of keeping them both open is small, or to choose the simpler option[5] and refactor if needed, if the cost of future refactor is expected to be smaller than keeping both options open. Either case will result in techincal debt due to the uncertainty of the situation.

Agile vs Waterfall

It might seem that I favor the old Waterfall approach and believe Agile is overkilling it, but it's not so. I think both are extremes at opposite end of the spectrum. Waterfall assumes everything's a fact. Agile assumes everything's a belief. Sometimes, one or the other is correct. But most of the time, the truth is somewhere in the middle.

Some things are known knowns (facts), some things are known unknowns (beliefs, if correctly recognized), and, no matter how much we try to reduce them, some are unknown unknowns (surprises). So an agile approach (not neccessarily an Agile methodology) and leaving some slack in the schedule and planning is crucial for the success of the project.


[0] I mean this in a somewhat narrow context of software that's result of a serious, non-trivial, professional software development activity. Not all software strictly must have a purpose - the development process itself can be a goal, the software medium can be an art form, and so on.

[1] This applies to such niches as games - where the game is either a new one (an original or a new installment in an existing franchise), or a small update to an existing title.

[2] Or someone higher up the food chain if an internal project, or the product manager in case of productized software.

[3] A big multinational household name company's division I worked with practised Scrum with the initial multi-month project plan being put in backlog (with hundreds of tasks) and each sprint we'd move some things from backlog to the current sprint and called that Agile.

[4] In context of producing software to address new and anticipated needs, essentially software startups, this is a matter of life or death for the company. The Lean Startup movement aims to fix this and has some good ideas.

[5] I lean heavily towards the simple option whenever possible, since complexity itself incurrs additional cost. Read more about this in my Progressive Fibonnaci post.

Progressive Fibonacci

Hey Protagonist, gimme a function calculating and returning N-th Fibonacci number.

Here you go:

def fib(n):
    if n == 0 or n == 1:
        return n
    return fib(n - 2) + fib(n - 1)

Hey, it crashes if I give it a non-natural number, that shouldn't happen. It should nicely report the error instead.

Okay, no problem, here's a modified version:

def fib(n):
    if (type(n) != int and type(n) != long) or n < 0:
        raise ValueError('n should be a non-negative integer')
    if n == 0 or n == 1:
        return n
    return fib(n - 2) + fib(n - 1)

Hmm, it's really slow for, like, N = 50.

Yeah, it's exponential. Here's an improved version that caches the intermediate results so they don't need to be recomputed:

def fib(n, cache={}):
    if (type(n) != int and type(n) != long) or n < 0:
        raise ValueError('n should be a non-negative integer')
    if n == 0 or n == 1:
        return n
    if n not in cache:
        cache[n] = fib(n - 2) + fib(n - 1)
    return cache[n]

WTF ?!

Okay, okay, exploiting mutability of default arguments in Python is kinda evil. Here's a more readable version:

def fib(n):
    if (type(n) != int and type(n) != long) or n < 0:
        raise ValueError('n should be a non-negative integer')
    cache = {}  
    def f(n):
        if n == 0 or n == 1:
            return n
        if n not in cache:
            cache[n] = f(n - 2) + f(n - 1)
        return cache[n]
    return f(n)

Umm, I'm tight on space on this device, can it use a constant amount of memory?

Sure, here you go [0]:

def fib(n):
    if (type(n) != int and type(n) != long) or n < 0:
        raise ValueError('n should be a non-negative integer')
    a, b = 0, 1
    for i in xrange(n):
        a, b = b, a + b
    return a

Cool! That solves it! Umm, I only actually ever need to calculate first 20 Fibonacci numbers...

No worries, use this:

def fib(n):
    seq = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181]
    try:
        return seq[n]
    except IndexError:
         return ValueError('n is out of bounds')

Which implementation is the best? The answer will shock you - all of them! (well, except maybe the evil one). Each implementation addresses all of the requirements put forth so far in the dialogue.

If you've chosen any of the above as "the best", you've added a few of your own unstated, undocumented, unchecked assumptions about the requirements into the mix. Don't do that.

Do the simplest thing that could possibly work[1].


PS. Not related to the discussion, but definitely related to Fibonacci, is a comment over at reddit describing Fast Fibonacci Transform, a method for calculating N-th Fibonacci number in logarithmic time, well worth a read.


[0] I cheated a bit, as this is clearly simpler than the previous versions. If only real-world problems would be as discernible as generating Fibonacci numbers!

[1] Where "work" is defined as "satisfy all the explicit requirements".

20. 05. 2017.

Ovdje

Ovdje

Mogli bi napraviti istraživanje, analizirati tekstove članaka prvih 100 domaćih portala, ali ono bi nam sigurno otkrilo da je najčešći tekst koji se pojavljuje u poveznicama riječ ovdje.

Smatraju li novinari i urednici da je prosječni posjetitelj njihovih stranica toliko neuk da ne bi znao prepoznati poveznicu i kliknuti na nju ako ne piše ovdje? Ili je to sindrom lonca, predaja govori da se to tako radi, a i ponekad je prevelika gnjavaža smisliti pravi tekst za poveznicu.

Možda su za to krivi i SEO stručnjaci? Kad je prvi novinar napisao tekst za poveznicu, onako kako bi to trebalo raditi, skočio je SEO stručnjak, lupio ga štapom po prstima i rekao da ne smije tako olako prosipati link đus. To je osobito važno ako se uzme tekst od konkurentskog portala. Pa nećemo valjda njima povećavati značaj, briši to, piši ovdje. Pravilo je postavljeno, svi ga se drže, kao u eksperimentu s majmunima i bananama.

Legenda kaže da Google (a kad govorimo o optimizaciji za tražilice onda i ne gledamo druge) na temelju teksta poveznice radi link profil. Pa ako sadržaj stranice optimizirate za neku riječ, i svi linkovi imaju taj tekst, da će vas Google penalizirati. Baš me zanima što Google misli o riječi iz naslova. Ne smijem je puno puta spominjati u tekstu jer će misliti da se pokušavam dobro pozicionirati za nju ;-).

Znam da meni ne vjerujete, jer nitko nije prorok u vlastitom selu, a i nisam neki SEO stručnjak, pa budem naveo nekoliko članaka o pravilnim poveznicama na koje sam naišao brzim guglanjem. To je dobar dokaz da znaju pisati poveznice jer ne bi bili među prvih par rezultata, zar ne?

Anchor Text Best Practices For Google & Recent Observations je napisao SEO stručnjak Shaun Anderson i u tekstu jasno govori "Don’t Use ‘Click Here’". Zanimljiva su i njegova opažanja o optimalnoj dužini tog teksta.

I čuveni Moz spominje Anchor Text pa kaže "SEO-friendly anchor text is succinct and relevant to the target page (i.e., the page it's linking to)."

Pravilni opis poveznice je sama suština weba kao takvog. On je svojevrsni ekvivalent fusnote u štampanom tekstu. Zamislite da čitate tekst u kojem se, umjesto riječi na koje se napomene odnose, svako malo pojavi riječ ovdje. To bi bilo malo naporno za čitanje.

Optimizirajte svoje tekstove za čitanje, za korisnike. Poveznice bi se trebale izgledom razlikovati od ostalog teksta (to se definira CSS stilom za vašu web stranicu) te kratko i jasno opisati sadržaj na koji vode. I to je sve što bi trebali znati za uspješno sudjelovanje u kampanji iskorijenimo ovdje.

28. 04. 2017.

Potemkin otvara podatke

Predizborno je vrijeme. To znači da će gradonačelnici i načelnici širom zemlje predstavljati različite projekte kojima žele pokazati i dokazati kako zaslužuju još jedan mandat. Neki od njih žele pokazati da idu u korak s vremenom pa slijede neke svjetske trendove u lokalnoj upravi. Čak se promoviraju i na informatičkim konferencijama.

Tako smo doznali da je Virovitica postala pametan grad, službeno je pokrenut portal otvorenih podataka te portal za prijavljivanje komunalnih nepravilnosti.

Portal MyCity još ne možemo vidjeti jer nas tamo dočekuje samo IIS default stranica. Portal otvorenih podataka je funkcionalan i na njemu imamo, slovima i brojkama, 6 (šest) skupova podataka. Baš me zanima kako će tih 6 Excel datoteka pomoći rastu digitalne ekonomije i koliko će to Grad Virovitica platiti?! U priopćenjima nema tog podatka.

Bilo je potpuno nepotrebno da Grad Virovitica pokreće taj projekt jer već postoji Portal otvorenih podataka Republike Hrvatske na kojem su mogli postaviti tih svojih 6 datoteka. Tu mogućnost iskoristili su gradovi Rijeka, Zagreb i Pula koji su zajedno objavili 123 skupa podataka. Da, ali s čime bi se onda gradonačelnik Kirin hvalio?

Najavljuje se da su Velika Gorica i Varaždin u pilot fazi ovakvog projekta. To valjda znači da se ne mogu odlučiti kojih 6 Excel tablica će uploadati. Službenici u tim gradovima su sigurno pod velikim pritiskom da to naprave prije lokalnih izbora.

26. 02. 2017.

Zašto je potrebno provjeriti e-mail adresu korisnika?

Jutros sam opet dobio jedan od tuđih mailova. E-mail adresa je moja, ali sadržaj nije namijenjen meni. Osoba se registrirala, navela moju e-mail adresu kao svoju, naručila neke stvari, dobila potvrdu narudžbe. Web trgovina koju je osoba koristila ne provjerava e-mail adrese s kojima se korisnici prijavljuju.

To je nešto najosnovnije, prije bilo kakve radnje trebali bi provjeriti e-mail adresu (uobičajeno je slanje tokena/kontrolnog koda na tu adresu). Ako se radi o servisu gdje trošite novce, a nisu implementirali taj osnovni prvi korak, kako im možete vjerovati da će brojevi vaših kartica ili neki drugi povjerljivi podaci biti sigurno obrađeni i sačuvani? Bolje je da ne koristite takve servise.

U najgorem slučaju može se dogoditi da greškom navedete e-mail osobe koja će zloupotrijebiti priliku. Registrirate se u nekom web dućanu gdje ste ostavili broj svoje kreditne kartice, sretni ste jer ima 1-Click-to-buy mogućnost i naveli ste pogrešan e-mail. Zločesti sretnik će postaviti novu zaporku (jer kao vlasnik navedene e-mail adrese to može napraviti), promijeniti adresu dostave i veselo kupovati. Vama će biti jasno što se dogodilo najvjerojatnije tek kad dobijete račun za svoju kreditnu karticu.

Zadnjih par godina primao sam svakojake poruke. Od računa, predračuna i ponuda do raznih zapisnika, seminara pa sve do ljubavnih poruka punih glupavih pjesmica i YouTube linkova na cajke. Kad sam dobre volje vratim poruku pošiljatelju i upozorim da pogrešnu adresu, a kad nisam onda poruke šaljem u smeće.

Jedini slučaj kad svaki put nastojim razriješiti zabunu su nalazi koje bolnice šalju bolesnicima ili obavijesti o zakazanom terminu pregleda. Ono što me užasava je da za tako ozbiljne stvari naš zdravstveni sustav koristi tako nepouzdane metode. Nekome pitanje života može ovisiti o pogrešno poslanoj poruci. Čemu nam služi sustav e-Građani ako ga ne koriste za takve namjene?

Kako su mene hakirali?

Među širokim narodnim masama je stvorena fama o hackerima koji uz pomoć računala provaljuju u tuđa računala, kradu novac, identitete korisnika. U praksi to uglavnom nije tako jer najlakši način za takve nestašluke napasti najslabiju kariku: ljude. Kevin Mitnick slavu je stekao socijalnim inženjeringom što je učeniji naziv za prevaru ljudi. Lakše je prevariti ljude nego provaliti na neki poslužitelj.

Prije par mjeseci odlučio sam kupiti kino ulaznice online. Kako rijetko kupujem ulaznice online nije me iznenadilo da mi spremljena zaporka (u tu svrhu koristim KeePassX) nije radila. Pripisao sam to vlastitom nemaru, kod manje važnih servisa zaboravim unijeti promjenu zaporke u KeePassX. Prošao sam proceduru za zaboravljenu zaporku, ponovno se prijavio, odabrao vrijeme i mjesta u dvorani i došao do trenutka prije plaćanja. Odjednom mi se učinilo da nešto nije u redu. Provjerim podatke i vidim svoje ime, ali drugo prezime. Pogledam predstave koje sam gledao i vidim par filmova za koje sam siguran da nisam gledao prije par mjeseci. Netko je kupovao karte koristeći moj korisnički račun. Nisam koristio mogućnost pamćenja broja moje kartice za bržu kupnju pa nije bilo nikakve štete (provjerio sam i izvode za moju karticu), preuzimatelj nije kupovao karte mojim novcima niti je on ostavio broj svoje kartice.

Nakon nekoliko poruka sa službom za korisnike uspio sam otkriti kako je došlo do sporne situacije. Upozorio sam ih da imaju sigurnosni problem. Nitko nije provalio na njihov poslužitelj, nisu im iscurili podaci, ali imali su grešku u proceduri postupanja i sve se svelo na ljudsku grešku. Uz postojeću online prijavu imaju mogućnost da korisnik ispuni obrazac i odmah dobije pristupne podatke. Preuzimatelj mojeg računa je to i napravio, ispunio je obrazac, ali umjesto svoje naveo je moju elektroničku adresu. I tu ulazi u igru ljudski nemar ili neznanje - operator je po e-mail adresi našao moj korisnički račun, postavio nove pristupne podatke i dao iz preuzimatelju. Bez ikakve provjere navedene e-mail adrese.

Sad vam je jasno kako je lako preuzeti nečiji korisnički račun kod tog prikazivača?! Dovoljno je da znate e-mail adresu postojećeg korisnika i predate ručno ispunjeni obrazac. Ako je taj odabrao mogućnost brze kupnje ulaznica možete besplatno uživati u kino predstavama. Dok vas ne uhvate.

Službi za korisnike pokušao sam dokazati kako imaju veliki problem, ali oni s kojima sam komunicirao ili nisu razumjeli ili nisu željeli priznati da imaju problem. Čak mi je rečeno i da je nemoguće da se to dogodi, ali eto dogodilo se meni. :-)

Crackeri

Hackeri u pravilu ne provaljuju u računala i uvreda je tim imenom nazivati one koji su vjerni izvornoj ideji hakerstva. Kriminalce (nazovimo ih pravim imenom) obično zovu crackerima ili black hat hackerima.

Rješenje za problem?

Provjeravanje e-mail adresa korisnika je prvi i logični korak. Druga mogućnost je da za identifikaciju korisnika koristite odgovarajući pouzdani servis. Državne i javne ustanove u Hrvatskoj bi za to mogle koristiti NIAS.

Osnovni problem kod NIAS-a je i taj što su nepotrebno omogućili cijeli niz izdavatelja vjerodajnica treće strane i na taj način unijeli dodatne komplikacije i povećali sigurnosni rizik. Ali to je posebna tema...

04. 12. 2016.

Sindrom lonca

Žena je spremala fantastičnu kuhanu patku u loncu. Muž je svaki put bio oduševljen no nešto ga je mučilo pa upita ženu: "Sve je super, samo mi jedno nije jasno. Zašto patki otkidaš batke i kuhaš ih posebno?" "Takav je recept po kojem je spremala moja mama. Ako te baš zanima možemo je pitati." Kad je mama došla pitali su je i ona im je odgovorila: "Tako je to uvijek radila baka. Možemo nju pitati." Kad su otišli u posjetu baki upitali su je za otkinute batke. Baka se nasmijala. "Djeco draga, mi smo bili siromašni i nismo imali velike lonce za kuhanje. Batke sam otkidala da bi patka stala u lonac. Ako imate dovoljno veliki lonac ne morate otkidati batke."

Prečesto se u IT-u susrećemo sa sindromom lonca. Stvari koje se rade po navici jer se tako prije radilo. Koriste se aplikacije koje svi koriste mada postoje bolja i prikladnija rješenja. Najbolji primjer je Photoshop kojeg su svi imali instaliranog na računalu (99% njih piratsku verziju) kao da je to jedina aplikacija s kojom mogu promijeniti veličinu slike ili odrezati ono što im smeta na njoj.

Malo neiskusniji korisnici često sve rade na jedan način zato što su naučili samo jedan način rješavanja problema. Boje se učiti ili napustiti svoju sigurnu zonu. Često su u zabludi pa je ono što smatraju sigurnom zonom zapravo vrlo nesigurno.

Ni profesionalci nisu imuni na lonce. U programiranju se često gleda stari kod i programira dalje na isti način. Koriste se biblioteke samo zato što su popularne, zato što ih svi koriste, zato što iza njih stoje veliki igrači. JavaScript i npm zasluženo uzimaju naziv kraljeva svih lonaca. Više nitko ne zna preuzeti neku biblioteku već prsti sami lete i tipkaju npm install. Često se to radi za neku jednostavnu funkcionalnost koju mogu i sami napisati.

Jasno da ne treba svaki dan propitivati i ispitivati metode, postojeći kod, biblioteke i alate. Ali ponekad se upitajte zašto nešto radite baš tako.

01. 12. 2016.

Kako je Bernardić promašio prvu loptu?

Scott Adams (Dilbertov tata) je napisao post o prvim koracima novog izvršnog direktora s osvrtom na Donalda Trumpa. Iznio je i nekoliko zanimljivih natuknica kojih nismo svjesni ili ih namjerno zanemarujemo.

Navodi da ljudi nisu racionalni i kako našim emocijama zauvijek vlada prvi dojam. Zbog toga pametan direktor (ili predsjednik) pokušava u prvim danima napraviti vidljivu promjenu, postići pobjedu. Traži se nešto što je vidljivo, što će svi zapamtiti, što mediji neće propustiti popratiti velikim naslovima, nešto što je u esenciji branda i nešto što je lako promijeniti.

Što je to Bernardić mogao napraviti? Negdje na marginama političkih vijesti mogli smo pročitati kako SDP nije promijenio predsjednika kluba zastupnika u Saboru. Ovdje sad možete zamisliti Johna Olivera i njegov klasičan izražaj kako viče "koja si budala, zašto to nisi napravio?!?".

Predsjednik kluba zastupnika SDP-a je Zoran Milanović koji uživa u tome da ne dolazi u Sabor. To je lopta na volej. Bernardić ga je trebao smijeniti, s njime pomesti pod i na izvanrednoj presici (bez obzira na to što svi mi već mrzimo izvanredne presice) izjaviti kako neradnicima više nije mjesto na funkcijama u SDP-u i kako od sada u obzir dolazi rad, rad i samo rad. Milanoviću ionako nije stalo do tog mjesta. To je bilo lako promijeniti, mediji bi navalili na kost, a Bernardić bi ostavio drugačiji prvi dojam. Pri tome ne mislim na članove SDP-a, oni su već svoje rekli, već na potencijalne glasače.

Ali nije, krenuo je polako, pažljivo, ne želi se zamjeriti...Možda Bernardić radi punom parom, ali mi to ne znamo. Šansa za dobar prvi dojam je propala.

Milanović je upravo zbog tog prvog dojma postao predsjednik SDP-a. Dok su se svi skanjivali, i držali pognute glave nakon Račanove smrti, on je prvi istupio i najavio svoju kandidaturu. Tu prednost ostali više nisu mogli stići.

Umjesto da krene s presingom, jer je 6 mjeseci do lokalnih izbora, Bernardić kreće polako. I onda će čuditi rezultatima izbora. PR stručnjaci SDP-a valjda spavaju, a stručne službe su umorne od izbora pa nisu stigli promijeniti ni sliku predsjednika na svojoj web stranici.

P.S. Nije ni Plenković krenuo s nekim nezaboravnim prvim dojmom. Valjda mu nije trebao, njemu je stranka sama pala u ruke.

30. 11. 2016.

Pametni znaju čemu služi Twitter filter

Twitter je malo zahtjevnija platforma od Facebooka. Treba ipak odabrati one koje pratiš da timeline na nešto nalikuje. Kod objave treba znati obuzdati skribomana u sebi i zgusnuti svoju misao u 140 znakova.

Meni je upravo zbog tog ograničenja Twitter najdraža društvena mreža. Tjera me da izbacim podštapalice koje inače nesvjesno koristim. I kad se njih riješim opet me tjera da ostavim ono što je važno, bez ukrasa. Na to su natjerani i ostali korisnici. Zbog toga je Twitter neprestana struja informacija. U toj struji ima i smeća, ali kako je smeće malo brzo nestane u bujici. Facebook je s druge strane jedan spora, algoritamski determinirana kaljuža, u kojoj, kad te pogodi smeće, to je obično veliki komad. Lijek protiv smeća na Facebooku je odabir opcija Hide post, See fewer posts like this, See less from Ime Prezime uz pomoć kojih se može utjecati na algoritam.

Tviteraši nemaju takav algoritam (na sreću) pa čitaju sve redom (ili ne čitaju, ali to je već druga priča). I onda se pojedinci odjednom bune kako će otići s Twittera jer vi pričate o Ljubavi na selu. Dobro, nekad je to Ples sa zvijezdama, Eurovizija, Game of Thrones, a nekima smeta nadolazeća Rouge Onemanija. Njihovo jamranje neće ništa promijeniti, samo donosi još više smeća u kanal. Kod nekih je problem žeđ za pažnjom, ego, vlastiti interesi, žal zbog toga što je timeline otišao na jednu stranu, a oni bi baš na drugu.

Živimo u vremenu kada smo bombardirani informacijama, to je šuma, bujica, potop. Nema nikakve šanse da ju zaustavite. Nakon više ili manje otpora bujica sve odnese. Ono što možete napraviti je da filtrirate timeline. Skoro svi Twitter klijenti to omogućavaju, a i Twitter ima Advanced muting options. Koristite to. U vremenu koje slijedi to će vam trebati svugdje. Sposobnost da se filtrira te odabere prava i pouzdana informacija mogla bi jednog dana postati i cijenjeno i dobro plaćeno zanimanje. Neki će reći da će to umjesto ljudi obavljati računala, umjetna inteligencija. Dobro, možda i hoće, ljudi su skloni izbjegavanju zamornih poslova. Onda bi treniranje takvih sustava za filtriranje moglo postati dobro plaćeno zanimanje.

Pažljivo birajte ljude koje pratite. Ne morate odmah oštrim backendom vraćati svaki follow. Ne budite kukavice koje neće kliknuti na Unfollow u strahu da ne izgubi jednog pratitelja. Ah da, po novom kukavice koriste mute opciju. Nekad vam netko ide na živce, pretjeruje, ali ga ne želite prestati pratiti jer tu i tamo izvali nešto zanimljivo? Praksa pokazuje da će ga u tom slučaju netko retvitati, nećete ništa propustiti.

Za vaše i duhovno zdravlje drugih je bolje da zaobiđete ono što vam se na sviđa. Ne morate se spotaknuti na svaku glupost i onda time gnjaviti svoje pratitelje. Ako želite malo pažnje recite to iskreno, već će vam netko pomoći. Za sve ostale stvari tu je filter i Unfollow.

P.S. Živim na selu. Ne pratim Ljubav na selu. ;-)

28. 11. 2016.

Ušminkavanje javne nabave

Prvih par godina, za vrijeme uvođenja Zakona o javnoj nabavi radio sam na aplikaciji za javnu nabavu pa sam upoznao zakon, tehničke detalje ali i praksu pojedinih naručitelja.

Ekonomska najbolja ponuda

Spremaju se promjene u Zakonu o javnoj nabavi i iz najava ministrice Dalić mogli smo čuti da kako će najniža cijena prestati biti jedini kriterij ocjene ponuda te da se uvodi ekonomska najbolja ponuda. Od početka uvođenja Zakona postojala je mogućnost ekonomske ocjene ponuda, najniža cijena nije bila jedini kriterij, samo što tu mogućnost naručitelji nisu koristili. Bila im je prevelika gnjavaža, previše posla.

Hoće li ekonomski kriterij stvarno povećati transparentnost? Da li to znači da će nakon donošenja odluke javno biti objavljene sve ponude te njihovo bodovanje po stavkama? Jer bez toga nema ni transparentnosti.

Kako se naručitelji prilagođavaju omiljenim ponuditeljima ovdje vidim samo dodane prilike za koruptivne aktivnosti jer omiljeni ponuditelj više neće morati biti najjeftiniji, a ekonomski kriterij može biti skrojen baš po njegovoj mjeri.

Preskupi i netransparentni oglasnik

Pretpostavljam da se kod smanjivanja parafiskalnih nameta misli na cijene objave u Elektroničkom oglasniku javne nabave RH. Za svaki natječaj potrebne su najmanje dvije objave (Poziv na nadmetanje i Obavijest o sklopljenim ugovorima) što je za naručitelja trošak od 1900 kn. Samo za objavu u Elektroničkom oglasniku. To ne uključuje objavu u tiskanom izdanju. Očito je da se takvom cijenom itekako pogoduje Narodnim novinama kojima je to dobar izvor prihoda. To je daleko, daleko više nego što bi bila realna (da ne kažem tržišna) cijena jednog oglasa u sustavu za objave.

Drugi problem s EOJN-om je nemogućnost lakog pristupa podacima u strojno obradivom obliku bez diskriminacije. Prijašnja verzija čak je i omogućila djelomično uspješno grebanje podataka, ali ovo trenutno rješenje je tako loše strukturirano da je lakše izvući podatke iz PDF dokumenta nego iz HTML objave. Kao da je namjerno tako napravljeno?! O nekom strukturiranom izvozu podataka nema ni govora.

Bio sam u prilici da vidim neke interne tehnikalije stranog sustava na temelju kojeg je napravljeno naše rješenje. U njemu je bila i XML schema za izvoz podataka ali očito je da su naši te detalje zanemarili.

Bagatelna zamka

Prije nego što je donesen Zakon o javnoj nabavi granica za nabavu bez natječaja je bila 200.000 kn. Oni koji su pratili web scenu tih godina sigurno se sjećaju javnih web stranica za 199.999 kn. Novi zakon spustio je granicu na 70.000 kn. Ali inicijalno u Zakonu je bila i jedna strašna odredba za sve naručitelje koje je već i samo spuštanje granice dobro šokiralo: grupiranje prema CPV broju tj. klasifikacijskom sustavu predmeta javne nabave.

Što je to značilo? Pretpostavimo da je naručitelj želio nabaviti 10 laptopa po cijeni od 10.000 kn. To je mogao napraviti tako da je nabavu razlomio na dva dijela i sve je moglo proći bez natječaja. Navedena odredba je to zabranjivala pa ako je vrijednost nabave u nekom razredu za jednu godinu bila veća od navedene granice, trebalo je ići na postupak javne nabave. To je užasnulo naručitelje. Neki od njih su se snašli pa su dobro proučili CPV katalog, nalazili su slične razrede i ipak su svoju nabavu sveli na nekoliko manjih. Sve po zakonu. Prije nego što je to CPV ograničenje zaživjelo kako treba ubrzo je ukinuto. Netko je to greškom importirao iz nekog stranog zakona?!

Nostalgija za dobrim starim vremenima je bila prejaka i prag od 70.000 kn opet se vraća na 200.000 kn (robe i usluge), a za radove se podiže na 500.000 kn.

Ministrica Dalić najavljuje i prijedlog da ugovori i planovi za bagatelnu nabavu objavljuju na web stranicama naručitelja. Izgleda da je zaboravila na natječaje. Najveći problem s bagatelnom nabavom je što je njezino organiziranje i provođenje u potpunosti prepušteno naručitelju koji sam određuje pravila i skoro da nema nikakvih zakonski definiranih obaveza.

Neki naručitelji, kao što su HRT ili Grad Zagreb objavljuju natječaje za bagatelnu nabavu na svojim web stranicama i omogućuju sudjelovanje zainteresiranim gospodarskim subjektima. Drugi to uopće ne rade, a neki idu do takvih krajnosti da svojim pravilnikom reguliraju da se ponuda može prikupiti usmenim dogovorom.

U Službenom glasniku Grada Velike Gorice je objavljeno:

Ponude se prikupljaju putem pisanog zahtjeva upućenog na adresu gospodarskih subjekata na dokaziv način, putem elektroničke pošte ili usmenim dogovorom.

Kako postižu dokazivost usmenog dogovora?

U naputku za bagatelnu nabavu u glasniku piše i da:

Za bilo koji postupak bagatelne nabave naručitelj može na svojim internetskim stranicama objaviti poziv za dostavu ponuda.

Ključna riječ je može, u praksi se to prevodi u ne mora pa Grad Velika Gorica na svojim web stranicama nema nijedan natječaj za bagatelnu nabavu.

Da bilo koji naručitelj u svojem naputku za javnu nabavu navede da sve bagatelna nabave osobno ugovara gradonačelnik, načelnik ili bilo koja druga osoba to bi bilo potpuno legalno prema sadašnjem Zakonu o javnoj nabavi.

Sve je po zakonu, uobičajena je fraza kojom se brane političari. Time vam žele poručiti da im možete staviti soli na rep i da oni rade ono što njima odgovara, a ne ono što je u javnom interesu.

Može li novi zakon nešto promijeniti da se ovakva praksa spriječi ili je riječ samo o uobičajenom šminkanju javne nabave i predstavi za naivnu javnost?

21. 11. 2016.

Preporuka za newsletter

Nije baš da čitam newslettere. Uglavnom ih preletim očima, ako nešto zapne kliknem na to, neke ostavim za poslije. Ti za poslije obično ne dođu na red. Ima i onih punih zanimljivih linkova koje je bolje izbjegavati jer će vam oduzeti pola dana dok sve pregledate. ;-)

Jednog dana na Hacker Newsima naišao sam dosta popularan link na Be Kind post. Kako nigdje nije bilo linka za RSS feed odlučio sam se pretplatiti na The Monday Mailer. Pretpostavljao sam da će doživjeti sudbinu svih ostalih, ali nije...

Kad ponedjeljkom sjedne u moj poštanski sandučić uvik odvojim tih par minuta da ga pročitam. Brian piše zanimljivo, ne predugo, o svakodnevnim stvarima, najviše vezano uz posao, hobije i daje korisne savjete. Pozitiva. Nije za one koji su hejterski nabrijani na sve što ih okružuje. Malo me podsjetio na moju profesoricu iz njemačkog iz srednje škole. Ona je voljela reći da život čine male stvari, i da se njima treba veseliti.

Prije pretplate možete provjeriti arhivu The Monday Mailer. Sadržaj u arhivi objavljuje se kasnije od pravog newslettera pa vas trenutno vodim za dva tjedna. Ali već idući ponedjeljak me možete stići...

P.S. Glazbena podloga za ovaj blog post bila je playlista koju je Mrak preporučio u svojem newsletteru.

18. 11. 2016.

Microsoft voli Linux?

Microsoft je objavio SQL Server vNext CTP1 za Linux. Onako izdaleka, preko medija, čini se da Microsoft stvarno voli Linux, otvoreni kod i sve ono za što je nekada tvrdio da je rak koji zarazi sve što dotakne. Microsoft koji je stvorio strategiju Embrace, extend and extinguish???

Microsoft je kompanija koja odgovara svojim dioničarima koji očekuju da ta kompanija zarađuje novce. Nema tu previše mjesta za emocije. Samo posao. Microsoftovo otvaranje i polako širenje na druge platforme (ovih dana najavljen je i Visual Studio for Mac) je poslovna nužnost, zadovoljavanje stalne potrebe za rastom i širenjem. To što se radi o Linuxu, otvorenom kodu ili nekoj sličnoj napasti manje je važno. Sve dok iz tog smjera miriše novac. Naravno da SQL Server za Linux ne znači potpuni zaokret u strategiji, ali Microsoft si ne može dopustiti da nešto propusti ili da negdje zakasni. Debakl s mobilnom platformom ih je nečemu naučio.

Tipkovnice portalskih kolumnista su se užarile. Piše se sve i svašta, ne znam što čeka onaj s tekstom 12 stvari koje SQL Server može naučiti iz Game of Thrones? A i onaj Što je Microsoft naučio iz filma Troll? izgleda da ima kreativnu krizu.

No tu je 8 no-bull reasons why SQL Server on Linux is huge for Microsoft. Djelomično bi se složio s 3. točkom (This is a slap at Oracle), samo što to nije šamar već Microsoft želi napasti Oracle. Ima logike, stari dinosaur se usporio, stasala je nova generacija upravitelja po tvrtkama koji nemaju strahopoštovanje prema Oracleu i koji će ga bez problema zamijeniti nešto jeftinijim, ali još uvijek enterprise grade SQL Serverom.

Točka 4. je već na tragu gluposti. SQL Server nije ni na tragu opasnosti za MySQL/MariaDB i PostgreSQL. Ekosustav aplikacija koje se služe tim bazama nema potrebu za prelaskom na SQL Server. Razmišljanje i način rada većine tih developera poprilično je različit od onoga što propisuje Microsoft i načina na koji se radi sa SQL Serverom. Ono što nedostaje su i nativni klijenti za pristup bazi. Microsoft u svojim primjerima za Python navodi ODBC. Nisam siguran da developeri umiru od želje da ga koriste.

O ostalim trla baba tipke točkama iz navedenog članka ne vrijedi ni raspravljati.

Kuda ide Microsoft? Postoji li opasnost da se dogodi ono što je izjavio jedan korisnik Reddita da će Microsoft prijeći na Linux kernel, koristiti Wine za kompatibilnost i podršku starih aplikacija te da će Windowsi postati desktop environment (kao što su GNOME, KDE, Unity) za Linux? Nikad ne recite nikad. Ako padne udio prihoda od Windowsa i njihovo održavanje postane preskupo to je jedna od mogućnosti. I onda će ljubav biti još jača. Ljubav koja se mjeri u malim, zelenim komadima papira...

16. 11. 2016.

Slučajna nabava saborskih tableta

Hrvatski sabor raspisao je poziv za dostavu ponude za nabavu tablet računala. Ideja da zastupnici koriste tablete umjesto hrpe papira je vrlo dobra i podržavam je, ali kao i uvijek čini mi se da bi dobra ideja mogla biti uništena lošom izvedbom. U svezi tih papira ja sam predlagao da se saborskim zastupnicima priredi jedna spačka i da im se na stolove isporuče hrpe praznih papira na čijem bi vrhu bila samo otisnuta naslovnica. Koliko bi njih primijetilo da na papirima ništa ne piše?

Količina

Prva stvar koju sam primijetio je da se traži točno 151 tablet. Kod većih nabava nekih uređaja uobičajeno da se naruči neki komad više (zbog mogućih kvarova i sličnih neprilika) ili da se od dobavljača traže zamjenski uređaji.

Garancija

Ima onih koji tvrde da će nakon 2 godine tableti biti neupotrebljivi i zastarjeli ja mislim da to ne bi trebao biti slučaj. Posjedujem 2 i pol godine stari model Sony Xperia Z2 tableta, koristim ga uglavnom za čitanje dokumenata, nešto malo unosa teksta i za pregled multimedije. Uopće ne sumnjam da će dobro služiti i još 2 godine. Sabor bi trebao tražiti produženo jamstvo za uređaja i garanciju na 4 godine.

Omjer zaslona i tehničke specifikacije

U pozivu je navedena rezolucija 1280x800. To je uobičajena rezolucija za Android tablete ali za predviđenu namjenu trebalo bi tražiti 4:3 omjer zaslona koji je puno prikladniji za čitanje dokumenata jer su oni bliži tom omjeru.

Radne memorije bi trebalo biti minimalno 2GB, a memorije za pohranu 32GB.

Aplikacije

Bez odgovarajućih aplikacija i dobrih uputa za njihovo korištenje tableti će kod većine zastupnika služiti samo za sakupljanje prašine (sjetimo se kolike im je samo probleme zadao sustav za glasanje). Izrada neke aplikacije koja bi podržala njihov rad zacijelo bi cijenom premašila nabavu uređaja. Ali zastupnicima nije potrebna posebna aplikacija.

Problem koji se želi riješiti je distribucija materijala. Ti materijali bi trebali ionako biti javno dostupni pa je najjednostavnije rješenje javni repozitorij radnih i ostalih materijala koji bi se na zastupničke tablete sinkronizirali uz pomoć jednostavnog klijenta. Rješenje bi trebalo biti otvorenog koda i za to postoji nekoliko dobrih rješenja: Nexcloud, ownCloud, Seafile.

Sabor već ima neko edoc rješenje koje se temelji na vlasničkom kodu i već na prvi pogled je jasno da su rješenja otvorenog koda koje sam predložio dovoljno dobra, ako ne i bolja od ovoga.

Najjeftiniji zaključak

Glavni kriterij je najjefitnija ponuda (što je s jedne strane logično), ali s ovakvim zahtjevima mogli bi dobiti neko "smeće". Ovo je samo poziv, do natječaja stignu ispraviti propuste, ali iskustva iz prošlosti nisu ohrabrujuća. Još bi mogli dodati da jedan od uvjeta bude da su uređaji proizvedeni u EU (ili čak u Hrvatskoj) pa bi se moglo dogoditi da dobijemo prepakirano "kinesko smeće". Hm, koja se od domaćih tvrtki specijalizirala za to?

15. 11. 2016.

Emoji kao konačna odlika

Mozilla Firefox u inačici 50.0 (možda bi bilo bolje da se inačice počinju označavati kao Ubuntu, po vremenu) donosi jednu odliku koja će sigurno poboljšati vaš korisnički doživljaj: ugrađene emotikone za Linux. Bez toga zaista niste mogli surfati kako treba.

Nisu prvi, OS X već se odavno hvali odličnom podrškom za emotikone, iOS ne zaostaje, Android se hvali svojim ružnim emotikonima, a da ne govorimo o tome koliko je riječi natipkano zbog drekec emotikona u Windowsima!!!

Uskoro, kad svi budemo vozili pametne aute, najčešća isprika prilikom kašnjenja na posao će biti:

Oprosti šefe, auto mi je stao na pola puta jer je morao skinuti kritičnu nadogradnju, došli su novi emotikoni.

Emotikoni neće stati na tome, ovo je tek početni korak u njihovoj evoluciji. Uskoro će vaša računala, mobiteli i automobili 3D printati emotikone i oni će biti male IOT stvari. Potrebni su nam novi vojnici za DDoS napade.

I onda jednog dana veterani video igara koji su igrali Elitu i odigrali Trumble misiju doživjet' će neobičan déjà vu.

10. 11. 2016.

Kakve zamke skriva Google AMP?

Google AMP projekt trebao bi riješiti problem sporog učitavanja mobilnih stranica. AMP JS biblioteka bi trebala osigurati brzo renderiranje AMP stranica, a Google AMP Cache još brže posluživanje. Zvuči dobro?

Ako ste na vodećoj poziciji u nekoj tvrtki gdje donosite konačnu odluku o web projektima, a nije baš da razumijete sve te tehničke detalje, možda ste pomislili da je to pravi put. Ipak je to veliki Google, ima super programere, sigurno su bolji od ovih vaših...

Istina je da svaki prosječni web developer može napraviti lakšu (i teoretski još bržu) web stranicu samo ako mu to dozvolite.

Ako se usporedi prosječna težina obične i AMP stranice onda vidimo je da obična WIRED stranica teška 2,7MB, dok je ista AMP stranica teška samo 0,6MB. Ali od toga nešto preko 100KB je AMP JS. Vaš developer bi u najgorem slučaju s čistim HTML/CSS-om napravio 0,5MB stranicu i još bi odvojio CSS u posebnu datoteku pa bi i tu uštedio po 21KB na svakoj učitanoj stranici. Ono što još dodatno komplicira izradu AMP stranice je poseban markup. Ne možete koristiti kod s desktopa već morate napraviti neko rješenje za pretvaranje u AMP kompatibilan kod (npr. umjesto img taga koristi se amp-img). JavaScript također nije dozvoljen mada je za AMP potrebna njegova JavaScript biblioteka. Google je uveo ograničenja i tjera vas da plešete po njegovim notama. Nije olakšao izradu web stranice već otežava.

Zanimljivost u ovoj priči je da Google napušta mantru o idealu responzivne stranice za sve uređaje i na mala vrata vraća podjelu na mobilni i desktop web.

Google CACHE zamka

Kad posjetitelj klikne na AMP link na rezultatima pretraživanja najvjerojatnije neće doći na vašu web stranicu već će mu Google poslužiti tu stranicu iz svojeg spremnika. AMP ograničenja kao da su napravljena baš tako da stranicu bude što lakše spremiti?! Aaaa, zbog toga CSS mora biti inline?!!

Alex Kras je jedan od prvih koji je ukazao na taj problem: Google May Be Stealing Your Mobile Traffic. AMP ekipa mu se javila, čak su ga pozvali na ručak kako bi mu objasnili da nije u pravu.

Bilo bi dobro kad bi korisnici imali mogućnost da isključe cache, ali to nije moguće. Po njihovim riječima cache je ključan element AMP-a i ako ga isključite micanjem AMP oznaka vaša stranica neće imati AMP ikonu i posebne pozicije rezervirane za AMP stranice.

Ono što bi trebali napraviti, ako se odlučite za izradu AMP stranice, je da omogućite korisniku put do vašeg sadržaja (dodavanjem izbornika uz pomoć amp-sidebara, karusel s prezentacijom ostalih članaka). Kolateralne žrtve tu bi mogli biti razni widgeti za razmjenu sadržaja jer vam neće biti u interesu da korisnika šaljete dalje jednog kad ste ga uhvatili u svoju AMP mrežu.

SEO prednost

AMP stranice neće imati prednost u rezultatima pretraživanja, rekli su iz Googlea. Osim što će te stranice imati posebne oznake i posebne pozicije u widgetima na vrhu stranice rezultata. Najavljeno je da bi brzina učitavanja neke stranice mogla utjecati na faktor za rangiranje. Ne čini li vam se da će onda ipak imati prednost?!

Treba li vam AMP?

Jako vam je važna posjećenost vašeg weba jer najviše o njoj ovise vaši poslovni rezultati? Ako je odgovor da, onda vam treba AMP jer ako ga vi nećete implementirati vaša konkurencija hoće i mogli bi dobiti značajnu prednost. Možda će Google za godinu dana odustati od AMP-a, ali tko to može znati?

Može li drugačije?

Naravno da može, samo treba volje da se stvari promijene i da ne radimo stvari na pogrešan način.

Za brzinu učitavanja AMP vam nije potreban, dovoljno je da radite manje i bolje stranice koristeći uobičajene web standarde i tehnologije. AMP je klasična lobotomija weba, mislim da i korisnici i developeri zaslužuju bolje od toga.

06. 11. 2016.

Još jedan novi početak

Prestao sam pisati blog postove želeći poštovati onu Eating your own dog food i umjesto Wordpressa koristiti neko rješenje temeljno na Django okruženju. To nije bio jednostavan zadatak jer nisam mogao pronaći odgovarajuću blog aplikaciju za Django. Nije da ih nema, ali svakoj bi pronašao neku falingu. U jednom trenutku sam počeo programirati vlastitu aplikaciju ali me poklopilo jedno drugo pravilo koje govori da ti je za 90% funkcionalnosti potrebno 10% vremena i onda za preostalih 10% funkcionalnosti potrošiš 90% vremena. Zapeo sam u tih 10%.

U mređuvremenu se pojavilo nekoliko novih i dobrih Django aplikacija pa sam izbor sveo na dvije: Mezzanine CMS i Puput. Mezzanine je bio dosta problematičan (import Wordpressa je pucao, problemi s aplikacijom za komentare), a kod Puputa sam stao nakon što sam riješio nekoliko problema s najnovijim libovima da bi konačno odustao zbog buga u libu kojeg Puput koristi. Puput se temelji na Wagtail CMS-u i ako vam je potreban neki dobar Django CMS bez previše legacy špageti koda onda je Wagtail odličan izbor. Riječ je o klasičnom page based CMS-u koji dolazi bez baterija (treba se upoznati s načinom pisanja Wagtail aplikacija i napisati nešto koda) ali ima odličan admin kojega možete jednostavno proširiti s nekoliko linija Python koda.

Procijenivši da bi potrošio više vremena na istraživanje i prilagođavanje ipak sam se vratio svojem kodu, prilagodio ga za novi Django, pokrpao HTML/CSS kod temeljen na Pure.CSS-u, podigao ga na server i nakon skoro tri godine počeo pisati ovaj post.

Puno se stvari u informatičkom krajobrazu promijenilo od posljednjeg posta, a da ne govorim o vremenu kad sam na blogu objavljivao postove skoro na dnevnoj bazi. Neke su stvari ostale iste, uglavnom one koje se tiču informatizacije naše javne uprave. Možda su čak i malo lošije. Moja Trello ploča puna je ideja i tema.

Novi blog donosi jednostavan dizajn gdje je sadržaj u prvom planu, neće biti oglasa i jedina promjena je kod komentiranja - komentari se neće odmah objavljivati, već nakon provjere, pa u početku molim za malo strpljenja. Postavljen je jednostavan karma sistem koji bi trebao omogućiti provjerenim korisnicima da se njihovi komentari odmah objavljuju. I to je za sada to. Sutra je novi tjedan i prilika da se počne s redovnim bloganjem. :-)

22. 07. 2016.

etckeeper, bind jnl files and git-pack memory problems

For last few years, one of first tools which we install on each new server is etckeeper. It saved us couple of times, and provides nice documentation about changes on the system.

However, git can take a lot of space if you have huge files which change frequently (at least daily since etckeeper has daily cron job to commit changes done that day). In our case, we have bind which stores jnl files in /etc/bind which results in about 500 Kb change each day for 11 zones we have defined.

You might say that it doesn't seem to be so bad, but in four months, we managed to increase size of repository from 300 Mb to 11 Gb. Yes, this is not mistake, it's 11000 Mb which is increase of 36 times! Solution for this is to use git gc which will in turn call git-pack to compress files. But this is where problems start -- git needs a lot of RAM to do gc. Since this machine has only 1 Gb of RAM, this is not enough to run git gc without running out of memory.

Last few times, I transferred git repository to other machine, run git gc there and than transfered it back (resulting in nice decrease from 11 Gb back to 300 Mb), however, this is not ideal solution. So, let's remove bind jnl files from etckeeper...

Let's start with our 11 Gb git repo, copy it to another machine which has 64 Gb or RAM needed for this operation.

root@dns01:/etc# du -ks .git
11304708        .git

root@dns01:~# rsync -ravP /etc/.git build.ffzg.hr:/srv/dns01/etc/
Now, we will re-create local files because we need to find out which jnl files are used so we can remove them from repo.
root@build:/srv/dns01/etc# git reset --hard

ls bind/*.jnl | xargs -i git filter-branch -f --index-filter 'git rm --cache --ignore-unmatch {}'

echo 'bind/*.jnl' >> .gitignore
git commit -m 'ignore ind jnl files' .gitignore
Now, finally we can shrink our 11 Gb repo!
root@build:/srv/dns01/etc# du -kcs .git
11427196        .git

root@build:/srv/dns01/etc# git gc
Counting objects: 38117, done.
Delta compression using up to 18 threads.
Compressing objects: 100% (27385/27385), done.
Writing objects: 100% (38117/38117), done.
Total 38117 (delta 27643), reused 12846 (delta 10285)
Removing duplicate objects: 100% (256/256), done.

root@build:/srv/dns01/etc# du -ks .git
414224  .git

# and now we can copy it back...

root@dns01:/etc# rsync -ravP build.ffzg.hr:/srv/dns01/etc/.git .
Just as side note, if you want to run git gc --aggressive on same repo, it won't finish with 60 Gb or RAM and 100 Gb of swap, which means that it needs more than 150 Gb of RAM.

So, if you are storing modestly sized files which change a lot, have in mind that you might need more RAM to run git gc (and get disk usage under control) than you might have.

20. 06. 2016.

Samsung Galaxy S2 vs Ubuntu PC performance

Introduction 

(this post has been updated in 2016)

It seems that many people assume that 1.2 GHz dual core mobile ARM CPU should be almost as fast as a PC CPU running on a similar frequency. They're wrong.

ARM cores are indeed more power efficient per square mm of surface on a same production process than Intel x86 and AMD64 architecture processors. Most of the efficiency comes from a simpler and more space efficient instruction set, but that advantage typically benefits only front-end of the CPU, which is not the biggest spender of those precious miliwatts.

The other reasons why modern dual or quad core mobile phones can run on a fraction of power that notebook or desktop (PC) CPUs need:

RAM speed significantly impacts many parts of phone performance. Executing complex JavaScript, image or video processing, Web page rendering are just some of the tasks that significantly benefit from having more RAM bandwidth. 

Your ARM device having significantly less of RAM bandwidth is also a big reason why you will probably avoid developing software on your new shiny ASUS Transformer Prime tablet/laptop (though I would certainly try:) )

So how much slower is your Android cell phone RAM than your PC RAM?


Unfortunately, I couldn't find any RAM bench-marking software that would run both on a Linux PC and on a un-rooted android device. There is a nice port of NBench, but NBench is a bigger benchmark and it needs some time before it prints out the one thing we need, the memory index. Also, it doesn't output MB/sec number, which is kind of unfortunate, since it's a really clear metric. 

So I found the really simplistic mbw (apt-get install mbw), made it even more simple (removed memcpy tests and left only the dumb array assignment part), and made Android NDK version of it.


RAMbandwidth

Source here. Be sure to close any apps before running it on a PC or your phone. Default array size being copied is 20 MB (the app needs 40 MB to perform the test) to better support low memory devices. 

Here are some results (20MB array size, 20 repetitions avg, run "mbw -t1 20 -n 20", default settings on RAMbandwidth, on some larger boxes 200MB size was used ):
~9000 MB/sec - Intel Core i7-5600U (DDR3 x2 1600 MHz)
~8200 MB/sec - Asus N56JR (Intel  i7-4700HQ, 2x DDR3 1600 Mhz memory)
~6800 MB/sec - Intel Xeon E5-1650 v2 4x DDR3 1600 MHz)
~5400 MB/sec - Intel Xeon X3430, DDR3 memory, under moderate MySQL load( 2009)
~6000 MB/sec - Thinkpad X230 Core i5 3320M (2x  DDR3 1600Mhz)
~6000 MB/sec - LG G5 (4 GB LPDDR4 2016, varies between 5800-6500)
~3800 MB/sec - Core i3-2310M 2x DDR3 1333Mhz
~2200 MB/sec - Intel Core 2 E8200, PC 6400 DDR2 RAM, Desktop PC (2008).
~1100 MB/sec - Intel Core duo L2400, PC 5300 DDR2 RAM on a  Thinkpad X60S laptop (2006). 
and our mobile contenters

~1500 MB/sec - LG G3 (3GB D855 - It varies from 800-1700)
~1200 MB/sec - Raspberry Pi 3
~690 MB/sec - Doogee Valencia2 Y100 Pro
~530 MB/sec- Raspberry Pi 2
~500 MB/sec - Samsung  Galaxy S2 (2011)
~250 MB/sec - HTC Desire (2010)
~120 MB/sec - Raspberry PI (2012, under X, fbdev 720p it falls to ~90 MB/sec) 
~55 MB/sec - HTC Magic (2009, had to use smaller 10MB array size because of limited RAM available) 


Samsung Galaxy S2 sometimes reports around 440 MB/sec, and sometimes 550 MB/sec. I guess it depends where kernel allocates the memory, maybe one of the memory banks shares the bus with the GPU, GSM CPU or some other greedy device. 

It should be easy to post some test results of your own hardware, so please share. 

EDIT: Check comments for some more results



17. 05. 2016.

Let's hack cheap hardware - 2016 edition

Last week I head pleasure to present at two conferences in two different cities: DORS/CLUC 2016 and Osijek Mini Maker Faire on topic of cheap hardware from China which can be improved with little bit of software or hardware hacking. It was well received, and I hope thet you will find tool or two in it which will fill your need.

I hope to see more hacks of STM8 based devices since we have sdcc compiler with support for stm8, cheap SWIM programmer in form of ST-Link v2 (Chinese clones, which are also useful as ARM SWD programmers) and STM8 has comparable features to 8-bit AVR micro-controllers but cheaper.

14. 05. 2016.

Kabelski internet i oversubscription


Ovo je post iz 5.11.2014. U međuvremenu sam promjenio kabelskog operatera 

Ako imate kabelski internet to znači da najvjerovatnije koristite jednu od sljedećih kabelskih tehnologija za prijenos digitalnih podataka:
- DOCSIS 1.0, 1.1, 2.0, 3.0 ili EuroDOCSIS standardi
- PacketCable 1.0, 1.5, 2.0 standardi koji na DOCSIS bazi grade razne usluge poput telefonije i digitalne televizije

Frekvencijski pojas svakog kabela podjeljen je na kanale. Širina kanala ovisi o standardu pa tako EuroDOCSIS koristi europsku širinu kanala od 8 MHz , a DOCSIS koristi američku od 6 MHz.

Podjela bandwidtha koaksijalnog kabela (Maksimalni downstream bandwidth koaksijalnog kabela je 4864 megabita prema primjeru niže)


Svi spomenuti DOCSIS transportni standardi imaju slične karakteristike oko toga koliku downstream propusnost podržavaju po jednom megahertzu, pa tako DOCSIS podržava 38 megabita po kanalu downloada, a EuroDOCSIS 50 megabita po kanalu downloada.

DOCSIS 1.1 je donio bolju standardizaciju i mogućnosti kontroliranja kvalitete usluge (QoS)

DOCSIS 2.0 je donio bolje upload brzine (27 megabita po kanalu u odnosu na DOCSIS 1.0 9 megabita po kanalu)

DOCSIS 3.0 je donio mogućnost da jedan korisnik istovremeno koristi više kanala tako povećavajući bandwidth.

DOCSIS 3.1 izdan u Listopadu 2013. je prva veća promjena u standardu jer donosi novu modulaciju 4096 QAM i odustaje od podjele kanala na 6 ili 8 MHz i umjesto toga koristi manje OFDM podkanale i u idealnim uvjetima podržava brzine do 10 gigabita downstream i 1 gigabit upstream. Još nije u primjeni.

E sad, sve je to divno i krano, ali zašto je uz takve ogromne brojke moj internet spor?

Koaksijalni kabel je medij koji dijelimo sa drugim korisnicima, za razliku od DSL-a gdje svaki modem ima vlastitu bakrenu paricu do centrale, kod kabelskih mreža dijelimo medij sa neodređenim i samo vašem ISP-u poznatim brojem korisnika. Obično operater nudi i uslugu kabelske televizije te je prostor za vaš internet sužen sa brojem kanala koji se koriste za TV uslugu.

Ajmo vidjeti jedan primjer u praksi na zagrebačkom području, za downstream:



Motorola SBV5121E

Koristi se modem Motorola SBV5121E (DOCSIS 2.0 i niže), što prema specifikaciji [2] znači da ima bandwidth za downstream od 88 do 860 MHz sa američkom širinom kanala od 6 MHz. Znači 772/6 = 128 kanala. Operater koji sam analizirao po mom saznanju 40 analognih TV kanala i 113 digitalnih. Recimo da se za ovih 113 digitalnih troši 30 6 MHz kanala u kabelu. Što znači da recimo srijedu uvečer, kad se ljudi vrate sa posla i škole, samo 58 različitih kućanstava (kanala) može istovremeno surfati punom brzinom od 38 megabita, svaki sljedeći korisnik koji krene surfati smanjuje brzinu ovim ostalima. 
Graf latencije (do prvog hop-a) na primjeru Zagrebačkog ISP-a dok korisnik osim za mjerenje ne koristi uslugu.

Operater kojeg sam analizirao nudi brzine od 8 megabita, što znači da bi teoretski trebao moći dati traženi bandwidth za (38/8) *58 = 275 korisnika, no pošto se tu vrijeme provedeno na kanalu po korisniku mora smanjiti kako bi se jedan kanal podjelio na više kućanstava, u tim slučajevima, čak i da surfa samo 275 kućanstava, njihova latencija (ICMP ping) sa odličnih 6-7 ms počinje rasti na (worst case, puna utilizacija na 418 korisnika) 4.75*7= 33 ms (molim ispravak ako je računica netočna, uzimam u obzir najmanju veličinu ICMP paketa tj. najmanju moguću diskretnu jedinicu u kojoj je moguće ostvariti komunikaciju). 

Dodatni problem je što DOCSIS 2.0 i niži ne omogućavaju brzo prebacivanje među kanalima, što znatno otežava dobru iskoristivost frekventnog spektra kabela (možda na drugim kanalima ima značajno više prostora za prijenos podataka).

U svakom slučaju, ako je previše korisnika koji dijele isti resurs (isti 6 MHz kanal, isti kabel) dolazi do drastičnog brzine pristupa pa tako kod ISP-a koji sam analizrao bandwidth pada na ispod 1 megabita, a ping ide i iznad 140 ms, uz česti packet loss.

Rijetko kada svih korisnici žele istovremeno i na period dulji od nekoliko minuta maksimalni bandwidth, pa je moguće (prema brojkama u primjeru) imati 10x više korisnika nego što je ukupnog kapaciteta (npr. 418 korisnika na 8 megabita na 88 kanala nego 4180 korisnika) a da sami korisnici ne primjete probleme u brzini pristupa, ali to uvelike ovisi o načinu korištenja Interneta. Moguće je da će više učenja na daljinu, skidanja igara preko Steam-a i sličnih servisa itd. značajno promjeniti navike korisnika u budućnosti.

Posao dijeljenja bandwidtha kada je više korisnika od broja slobodnih kanala rade zajedno modem kod korisnika i CMTS uređaj kod operatera. CMTS radi mnoge slične funkcije koje u DSL sustavima radi DSLAM, ali uzevši u obzir karakteristike dijeljenog koaksijalnog medija. CMTS omogućava da i do 1000 korisnika dijeli isti 6 MHz kanal. Koristi tehniku zvanu Statistical time division multiplexing. Nisam našao na podatak može li jedan CMTS uređaj stvarno i napuniti svih 128 downstream kanala i još 60 Mhz upstream bandwidtha. Svakako mu za to treba barem 10 gbit ethernet sučelje.

ISP može poboljšati infrastrukturu tako da smanji broj korisnika koji dijele jedan jedini kabel, ili poveća broj kanala koji se koriste za DOCSIS ukoliko medij ima slobodne kanale.
Također, ISP može početi koristiti digitalnu TV kako bi iskoristio mogućnost digitalne kompresije video i audio zapisa i time smanjio potreban bandwidth po TV kanalu za bar 4 puta (moguće i više sa kompresijom naprednijom od MPEG2), no ovo znači da operater mora svim korisnicima zamjeniti receivere za TV, što može biti značajna investicija.

Osnovana je i Facebook grupa gdje se korisnici mogu požaliti na svog operatera ili raspravljati o boljim operaterima i tehnlogijama poput recimo FTTH ili VDSL-a.

Pridružite nam se na:

https://www.facebook.com/groups/hocuboljiinternet/


Linkovi:

[1] http://en.wikipedia.org/wiki/DOCSIS
[2] http://www.wiretechsa.com.ar/PDF/equipamientointernet/SBV5121.pdf
[3] http://computer.howstuffworks.com/cable-modem.htm
[4] http://www.lightreading.com/cable-video/docsis/docsis-31-whats-next/d/d-id/708425
[5] http://www.cisco.com/c/dam/en/us/solutions/collateral/service-provider/cable-high-speed-data-hsd-solutions/gateway_to_connected_life_white_paper.pdf





01. 05. 2016.

Smart public transport with small automated, semi-automated or manually driven vehicles

Here's just and idea (feel free to use it in any way):

Imagine having a network of small (4-6 passengers) vehicles servicing a city for daily transportation needs. Users would enter a desired location and arrival time. The arrival time could be flexible (within an hour, if not then the price could be appropriately higher) and the user would announce any regularity (for example detailing a weekly commute) that could be used for future planning.

The centralized system would optimize the problem of getting all passengers to their respective  locations and suggest departure time and location (preferably within a few minutes of walking distance).

An interesting open source implementation would use OpenStreetMap data and have simulations and visualizations. A commercial entity could deal with deployments on various locations and provide a stable software as a service around the core open implementation. Autonomous vehicles would provide much more efficient operation of such a network and lower the costs significantly.

19. 01. 2016.

Debian OpenLDAP with GnuTLS and OpenSSL certificates

Every few years we have to renew SSL certificates. And there is always something which can go wrong. So I decided to reproduce exact steps here so that Google can find it for next unfortunate soul who has same problem.

Let's examine old LDAP configuration:

deenes:/etc/ldap/slapd.d# grep ssl cn\=config.ldif 
olcTLSCACertificateFile: /etc/ssl/certs/chain-101-mudrac.ffzg.hr.pem
olcTLSCertificateFile: /etc/ssl/certs/cert-chain-101-mudrac.ffzg.hr.pem
olcTLSCertificateKeyFile: /etc/ssl/private/mudrac.ffzg.hr.gnutls.key
We need to convert OpenSSL key into format which GnuTLS understands:
deenes:/etc/ssl/private# certtool -k < star_ffzg_hr.key > /tmp/star_ffzg_hr.gnutls.key
Than we need to create certificate which includes our certificate and required chain in same file:
deenes:/etc/ldap/slapd.d# cat /etc/ssl/certs/star_ffzg_hr.crt /etc/ssl/certs/DigiCertCA.crt > /etc/ssl/certs/chain-star_ffzg_hr.crt
All is not over yet. OpenLDAP doesn't run under root priviledges, so we have to make sure that it's user is in ssl-cert group and that our certificates have correct permissions:
deenes:/etc/ldap/slapd.d# id openldap
uid=109(openldap) gid=112(openldap) groups=112(openldap),104(ssl-cert)

deenes:/etc/ldap/slapd.d# chgrp ssl-cert \
/etc/ssl/certs/DigiCertCA.crt \
/etc/ssl/certs/star_ffzg_hr.crt \
/etc/ssl/certs/chain-star_ffzg_hr.crt \
/etc/ssl/private/star_ffzg_hr.gnutls.key

deenes:/etc/ldap/slapd.d# chmod 440 \
/etc/ssl/certs/DigiCertCA.crt \
/etc/ssl/certs/star_ffzg_hr.crt \
/etc/ssl/certs/chain-star_ffzg_hr.crt \
/etc/ssl/private/star_ffzg_hr.gnutls.key

deenes:/etc/ldap/slapd.d# ls -al \
/etc/ssl/certs/DigiCertCA.crt \
/etc/ssl/certs/star_ffzg_hr.crt \
/etc/ssl/certs/chain-star_ffzg_hr.crt \
/etc/ssl/private/star_ffzg_hr.gnutls.key
-r--r----- 1 root ssl-cert 3764 Jan 19 09:45 /etc/ssl/certs/chain-star_ffzg_hr.crt
-r--r----- 1 root ssl-cert 1818 Jan 17 16:13 /etc/ssl/certs/DigiCertCA.crt
-r--r----- 1 root ssl-cert 1946 Jan 17 16:13 /etc/ssl/certs/star_ffzg_hr.crt
-r--r----- 1 root ssl-cert 5558 Jan 19 09:23 /etc/ssl/private/star_ffzg_hr.gnutls.key
Finally, we can modify LDAP configuration to use new files:
deenes:/etc/ldap/slapd.d# grep ssl cn\=config.ldif 
olcTLSCACertificateFile: /etc/ssl/certs/DigiCertCA.crt
olcTLSCertificateFile: /etc/ssl/certs/chain-star_ffzg_hr.crt
olcTLSCertificateKeyFile: /etc/ssl/private/star_ffzg_hr.gnutls.key
We are done, restart slapd and enjoy your new certificates!

24. 09. 2015.

FSec 2015 - Raspberry PI for all your GPIO needs

When I started playing with Raspberry Pi, I was a novice in electronics (and I should probably note that I'm still one :-).

But since then, I did learn a few things, and along that journey I also figured out that Raspberry Pi is great little device which can be used as 3.3V programmer for AVR, JTAG, SWD or CC111x devices (and probably more).

I collected all my experiences in presentation embedded below which I had pleasure to present at FSec conference this year. I hope you will find this useful.

20. 06. 2015.

DORS/CLUC 2015: AVR component tester

Few weeks ago, we had our annual conference DORS/CLUC 2015 on which I had interesting (hopefully) presentation about AVR component tester. Since then, we got video recording of conference, so below you can find embedded presentation and video recording (in Croatian).

30. 01. 2015.

Overview of ganeti cluster from command line: ps, kvm, proc and tap

We have been running ganeti cluster in our institution for more than a year now. We did two cycles of machine upgrades during that time, and so far we where very pleased with ability of this cloud platform. However, last week we had a problem with our instances -- two of them got owned and started generating DoS service attack to external resources. From our side it seemed at first like our upstream link is over saturated, and we needed to way to figure out why it is.

gnt-info.png

At first, it seemed like this would be easy to do. Using dstat,i I found that we are generating over 3 Gb/s traffic every few seconds to outside world. We have 1 Gb/s upstream link, but our bonded interfaces on ganeti nodes can handle 3 Gb/s of traffic, so for a start we where saturating our own link.

But which instance did that? I had to run dstat on every node in our cluster until I found two nodes which had instances which where overloading our link. Using iftop I was able to get hostname and IP address of instances which I wanted to shut down. However, this is where problems started. We didn't have DNS entries for them, and although I had IP and mac address of instances I didn't had easy way to figure our which instance has that mac.

Than I figured out that I can get mac from kvm itself, using ps. Once I found instances it was easy to stop then and examine what happened with them.

But, this got me thinking. Every time I have a troubleshooting problem with ganeti, I basically use more or less same command-line tools to figure out what is going on. But I didn't have a tool which would display me some basic stats about instance, but including mac addresses and network traffic (which in our configuration are tap devices added to bridges). So I wrote, gnt-info which presents nice overview of your instances in ganeti cluster which you can grep to drill-down into particular instance or host.

28. 01. 2015.

Junk Social

At some point, a couple of months ago, I noticed that my Facebook feed devolved into 9GAG reposts and random things about people I sometimes knew, often not. Twitter wasn't any better — it was mostly flame wars about web development issue du jour[0].

This wasn't the case of me just not grooming the feeds. I heavily curated Twitter accounts I followed, and my Facebook friends are my actual real-life friends (or at least real-life acquaintancies I'm on a friendly basis with).

The trouble was — I still spent a lot of time on both Twitter and Facebook! It's easy to get drawn into a Twitter conversation, or follow trails of meme images. And before you know it, half an hour has passed. While sometimes something genuinely interesting[1] came up, signal to noise ratio was too low.

So I decided to just visit occasionaly, a few times a week. I'd go over interesting stuff, ignore the pointless drivel, still get some value out of the experience, have fun and not waste too much time.

But coming back to Twitter and Facebook after a few days felt like watching a soap opera after a few days' pause. Nothing much happened in the meantime — certainly nothing digging for in the unwidely Twitter and Facebook user interfaces[2]. Having stepped out of the stream, I found it even less appealling. It was boring and I stopped coming back.


In the above description, there's very little “social”. While Twitter and Facebook are social in the sense that people communicate over them, in most[3] cases the communication is so shallow[4] and ephemeral that it becomes meaningless. It's an endless, constant chit-chat.

It's Junk Social. Like junk food, it satisfies the immediate need, in this case for social contact, but its nutritional value for the psyche is low[5]. And like junk food, it's easy to overindulge.

I unapologetically eat junk food — in small amounts. And I do think Junk Social can have value (and be fun) — in small amounts.


[0] http://xkcd.com/386/

[1] Like Postmodern Jukebox

[2] While they put a lot of effort into the experience of the user consuming Now, it's obvious the use case of someone digging through Past is not high on the priority list.

[3] Notable exception is coordinating something in real-life, like setting up a meetup, pinging friends to go to the movies, organizing a charity drive or staging a revolution. The value here is always tied to the real-world behaviour, though, and using the social network as a communication tool, which is not exactly new — email, IRC, forums have all been used for this for decades.

[4] Twitter practically enforces this with their 140-character limit. It's impossible to have a thought-out, insigthful conversation there.

[5] Standard disclaimer applies: this is only my opinion, and I'm neither a nutricionist nor psychologist.

18. 12. 2014.

Controlling 315 MHz light sockets using Arduino

We all read hackaday, and when I read Five Dollar RF Controlled Light Sockets post I decided that I have to buy some. However, if you read comments on original Cheap Arduino Controlled Light Sockets - Reverse Engineering RF post and especially comments, you will soon figure out that ordering same looking product from China might bring you something similar but with different internals.

In my case, all four light sockets turn on or off with any button press on remote which was a shame. When I opened remote and socket, I also had bad surprise. My version didn't have any SPI eeprom, but just two chips, ST F081 FB 445 in remote and ST ED08 AFB422 in light bulb (in picture hidden below receiver board).

remote.jpg socket-top.jpg socket-bottom.jpg

But, I already had acquired two sets so I wanted to see what I can do with them. Since I couldn't read eeprom to figure out code, I decided to use rtl-sdr to sniff radio signals and try to command them using cheap 315 MHz Arduino module.

I used gqrx to sniff radio signals and I was not pleased. Remote drifted all over the place mostly around 316 MHz and it was some trial and error to capture signals which are generated when buttons are pressed. However, I have verified that it's sending same signal multiple times no matter which keys I press (which would explain why four pins on remote are soldered together).

After a while I had two traces (since I have two sets of light sockets) and could decode binary data which is sent from following picture:

signals.png

How I knew that one set is transmitting 1000100110110000000000010 and another one 1011001001011111000000010. From looking into timing in audacity, it seemed that each bit is encoded in short-long or long-short sequence where short one is about third of long one, and one bit is about 1200 ms. I cheated here a little and stuck scope into scope into transmit trace on remote to verify length of pulses just to be sure.

So as next step I wrote simple Arduino sketch to try it out:

#define TX_PIN 7
#define LED_PIN 13

char *code = "1000100110110000000000010";
//char *code = "1011001001011111000000010";

void setup() {
  pinMode(LED_PIN, OUTPUT);
  pinMode(TX_PIN, OUTPUT);
}

void loop() {
  digitalWrite(LED_PIN, HIGH);

  for(int i = 0; i  strlen(code); i++) {
    int i1 = 300;
    int i2 = 900;
    if (code[i] == '1' ) {
      i1 = 900;
      i2 = 300;
    }
    digitalWrite(TX_PIN, HIGH);
    delayMicroseconds(i1);
    digitalWrite(TX_PIN, LOW);
    delayMicroseconds(i2);
  }
  
  digitalWrite(LED_PIN, LOW);  
  delay(3000);
}
So, I compiled it, uploaded to Arduino and... nothing happens. Back to the drawing board, I guess.

When I was looking into gqrx I could see that signal is sent as long as I'm holding button up to 10 seconds. From experience before I know that this cheap receivers need some tome to tune into frequency so next logical step was to send same signal multiple times. And guess what, when I sent same singal twice with 2000 ms delay between them everything started to work.

Again somewhat. Light socket in far corner of hall seemed to have problems receiving signal which would put two light socket in hall in opposite state: one would be on and another would be off. This was fun, and could be fixed with simple antenna on Arduino module (since currently I don't have any) but I will conclude that your IoT device should send different codes for on and off state so something like this won't happen to you.

Then I got carried away and added commands to change all parameters to experiment how sensitive receiver is. You can find full code at http://git.rot13.org/?p=Arduino;a=blob;f=light_sockets/light_sockets.ino With this experiments I found out that you don't have to be precise with timings (so my oscilloscope step was really not needed). Receiver works with 500 ms low and 1100 ms high (for total of 1600 ms per bit) on high end, down to 200 ms for low and 800 ms for high (for total of 1000 ms per bit).

I suspect that chips are some kind of 26 bit remote encoders/decoders but I can't find any trace of datasheet on Internet. This is a shame, because I suspect that it's possible to program light sockets to respond to any code and in theory address each of them individually (which was my goal in beginning). However poor construction quality, and same code for on and off state (combined with poor reception) makes me wonder if this project is worth additional time.

02. 11. 2014.

Reusing servos from old printers with Arduino

I must confess that I'm pack rat. When I see old printer, something inside my head tries to figure out what I can do with all parts inside it instead of passing it to land-fill. However, I'm sysadmin and software guy, so JTAGs and programming is more up my wally than hardware. However, I decided to figure out how to drive one of servos using Arduino and this is my journey through this experience.

So I started with printer disassembly and got one stepper motor and some gears on it. It is Mitsumi M42SP-6TE. It has four wires and I couldn't find any data sheet about it. So what do I do now?

Mitsumi-M42SP-6TE.jpg

First some educated guesses.I assumed that it's 12V servo. This was somewhat influenced by examining similar Mitsumi MP42SP-6NK motor which have rating of 12V or 24V. Using unimer and taking ohm measurement between wires I confirmed that it has 10 Ω between coils which means it's bipolar, having two different coils which had both to be driven at the same time.

stepper-coils.jpg

To connect it to Arduino, I acquired some time ago clone of Adafruit motor shield. When you buy cheap clones you expect some problems, and mine was fact that screw terminals on board weren't cut flash with board, so I had to use flat cutters and shorten them to prevent motor power from shorting with ICSP header on Arduino and USB connector on Uno. I also used red electrical tape and put it on USB connector just to be safe(r).

AFMotor.jpg

I also needed to add power jumper (white on picture) to provide power from Arduino (which in turn is powered by 12V 1A adapter). However, in this configuration L293D H-bridge becomes very hot to touch, so for testing I modified StepperTest example to provide me with serial control and powered Arduino from USB port (from which it draws 0.42 A and stepper still works with 5V supply which makes my 12 V assumption somewhat questionable). This enabled me to deduce that this stepper is also 7.5° which takes 48 steps to do full turn (small red dot on stepper gear helped to verify this). I also verified that top gear has 13:1 ratio to stepper motor making gear mechanism useful for smaller movements and better tork.

I hope this blog post will motive you to take old printers, scanners, faxes and similar devices apart and take useful parts out if it. Re-using boards for driving steppers is also very interesting, but this particular printer didn't come with power supply (and it has strange connector) and driver chip on it doesn't have any publicly available info, so this will have to wait some other printer which will decide to give up it's parts for my next project...

22. 10. 2014.

SysV init on Arch Linux, and Debian

Arch Linux distributes systemd as its init daemon, and has deprecated SysV init in June 2013. Debian is doing the same now and we see panic and terror sweep through that community, especially since this time thousands of my sysadmin colleagues are affected. But like with Arch Linux we are witnessing irrational behavior, loud protests all the way to the BSD camp and public threats of Debian forking. Yet all that is needed, and let's face it much simpler to achieve, is organizing a specialized user group interested in keeping SysV (or your alternative) usable in your favorite GNU/Linux distribution with members that support one another, exactly as I wrote back then about Arch Linux.

Unfortunately I'm not aware of any such group forming in the Arch Linux community around sysvinit, and I've been running SysV init alone as my PID 1 since then. It was not a big deal, but I don't always have time or the willpower to break my personal systems after a 60 hour work week, and the real problems are yet to come anyway - if (when) for example udev stops working without systemd PID 1. If you had a support group, and especially one with a few coding gurus among you most of the time chances are they would solve a difficult problem first, and everyone benefits. On some other occasions an enthusiastic user would solve it first, saving gurus from a lousy weekend.

For anyone else left standing at the cheapest part of the stadium, like me, maybe uselessd as a drop-in replacement is the way to go after major subsystems stop working in our favorite GNU/Linux distributions. I personally like what they reduced systemd to (inspired by suckless.org philosophy?), but chances are without support the project ends inside 2 years, and we would be back here duct taping in isolation.

12. 10. 2014.

Beyond type errors

Consider this Python function:

def factorial(n):
    """Returns n! (n factorial)"""

    result = 1
    for i in range(2, n + 1):
        result *= i

    return result

Provided that this code is correct, what are kinds of errors (bugs) that can happen when this function is used?

First that come to mind are type errors. The factorial function could be called with something that's not a number - for example, a string. Since Python is dynamically-typed, that would result in a runtime exception:

>>> factorial("hello")
TypeError: cannot concatenate 'str' and 'int' objects

Some languages are statically typed so the same type error would be caught earlier, at compile time. For example, in Haskell:

factorial :: (Num a) => a -> a
factorial n = if n == 0 then 1 else n * factorial (n - 1)

Compiling a program that attempts to use factorial with a string would result in a compile-time error such as:

No instance for (GHC.Num.Num [GHC.Types.Char])
arising from a use of `factorial'

Statically-typed languages have the advantage that, when properly used, they can help detect most of the type errors as early as possible.


The second class of errors are value or domain errors. These errors occur when the code is called with arguments of proper types, but unacceptable values - in other words, when the function is called with arguments outside its domain. The factorial function, in math, is defined only for positive integers. In code, if the caller supplies -1 as the value of n, it will make a value error.

The Python function defined earlier will return an incorrect result 1. The Haskell counterpart will run for a long time, consume all available stack, and crash.

The usual way to deal with this problem is by practicing defensive programming, that is, by adding manual guards. The functions might be rewritten as:

def factorial(n):
    """Returns n! (n factorial)"""

    if n < 0:
        raise ValueError('factorial is not defined for negative integers')

    result = 1
    for i in range(2, n + 1):
        result *= i

    return result
factorial :: (Num a, Ord a) => a -> a
factorial 0 = 1
factorial n
    | n < 1     = error "factorial is not defined for negative integers"
    | otherwise = n * factorial (n - 1)

in Python and Haskell, respectively. Note that in both cases, this is a runtime error - it's not possible to detect it at compile time[0].

A formalization of the concept of guards gives us design by contract, in which classes and functions can specify preconditions, postconditions and invariants that must hold in the correct program. The checks are specified in ordinary code and are also run in runtime. Design by contract was introduced in Eiffel but, sadly, never gained mainstream adoption.


The third class of errors are more insidious, to a point where it's not clear whether they should be called errors at all: using a piece of code outside the context it's intended for. Often, the code works, but has performance problems, and it is hard to determine where the error boundary is.

To continue with the factorial example, using either the Python or the Haskell function to calculate 1000000! won't work, or it'll run for a long, long time and use huge amount of memory, depending on the computer it is executed on.

For a more realistic example, consider a piece of code using Django ORM to fetch data from the database:

class Book(models.Model):
    author = models.ForeignKey(Author, related_name='books')
    published = models.BooleanField(default=False)
    ...

class Author(models.Model):
    ...

    @property
    def published_books(self):
        return self.books.filter(published=True)

By itself, it looks rather solid. The results are filtered by the database and only the interesting ones are returned. Also, the Author object hides the complexity of using the correct query[1]. But then there comes the caller[2]:

def list_authors():
    authors = Author.objects.all().prefetch_related('books')
    for author in authors:
        print author.name
        for book in author.published_books:
            print book.title

Now, the caller function did try to do the right thing. It told Django it would need to use the related table (books). But what it didn't know is how exactly the published_books property was implemented. It just so happens that the implementation ignores already prefetched books in memory and loads them again from the database - at one SQL query per author.

In both examples[3], the error is not in the function itself, but in the calling code - it has failed to meet some basic assumption that the called code makes. However, these assumptions are impossible to encode in the code itself. The best one can do is to list them thoroughly in the documentation and hope the programmer calling the functions will read it and keep it in mind.


In all three classes of errors, the caller invoked the function in a way that violated some of its assumptions - either about argument types or values, or about the wider context it was running in.

Type errors are easy to catch, especially in statically-typed languages. Value errors require more work (defensive programming or contracts), and are thus often ignored until they happen in production, but they can be dealt with. (Mis)using the code in a different context is something that's not even widely recognized as an error, and the best approach in handling them, usually boils down to “don't do this” notes in documentation.

Can we do better? Can we redefine component interfaces to go beyond types or even values? To describe, guarantee and enforce all behaviour, including time and space complexity, and interactions with the outside world? Would it make sense at all?

I don't know, but it's an interesting line of thought.


Notes:

[0] At least not without making it very impractical to use, or until languages with value-dependent types, such as Idris, become mainstream.

[1] We could have done the same thing in a ModelManager for Book that would wrap the query. The example would have been a bit longer but the point would still stand.

[2] Example is a normal function instead of Django view to ease the pain for non-Django users reading the article.

[3] For another extended example, read my article about Security of complex systems.

26. 09. 2014.

Security of complex systems

As I write this, the Internet is in panic over a catastrophic remote code execution bug in which bash, a commonly-used shell on many of the today's servers, can be exploited to run arbitrary code.

Let's backtrack a bit: how is it possible that a bug in command-line shell is exploitable remotely? And why is it a problem if a shell, designed to help its user run arbitrary code, allows the user to run the code? It's complicated.

Arguably, bash is just a scapegoat. Yes, it does have a real bug that causes environment variables with certain values to be executed automatically, without them being invoked manually[0]. But that seems like a minor issue, considering it doesn't accept input from anyone else but the local user and the code runs as the local user.

Of course, there's a catch. Certain network servers store some information from the network (headers from web requests) in an environment variable to pass it on (to the web application). This is also not a bug by itself, though it can be argued it's not the best possible way to pass this information around.

But sometimes, web applications need to execute other programs. In theory, they should do so directly by forking and executing another programs, but they often use a shortcut and call a standard system function, which calls the application indirectly - via the shell[0]. As an example, that's how PHP invokes the sendmail program when the developer calls the mail function.

Any one of the above, when taken separately, though not ideal, doesn't seem like a serious problem. It is the compound effect that's terrifying:

(This is an example with web servers, but other servers may be equally vulnerable - there are proof-of-concept attacks against certain DHCP and SIP servers as well).

So who's to blame? Everybody and nobody. The system is so complex that unwanted behaviours like these emerge by themselves, as a result of the way the components are connected and interact together[2]. There is no single master architect that could've anticipated and guarded against this.

The insight about this emergent behaviour is nothing new, and was in fact described in detail in the research paper How Complex Systems Fail, a required reading for ops engineers at Google, Facebook, Amazon and other companies deploying huge computer systems.Although the paper doesn't talk about security in specific, as Bruce Schneier puts it, it's all fundamentally about security.

There is no cure. There's no way we can design systems of such complexity, including security systems, so that they don't fail (or can't be exploited).

The best that we can do is to be well-equipped to handle the failures.


[0] Curiously enough, bash accepts -r option to activate restricted mode, in which this, and a host of other potentially problematic features, are turned of. The system function doesn't use it though, because that's not a standard POSIX shell option, it's an addition from bash. Arguably, bash should detect it's being called as a system shell and run in POSIX compatibility mode, but compatibility doesn't neccessarily forbid adding new features. In fact, bash, even when running in POSIX compatibility mode with --posix has the same behaviour. Turtles all the way down.

[1] There are valid reasons to invoke sub-processes via the shell beyond the convenience of system(3): environment variable expansion (ironic, isn't it?) or shell globbing come to mind.

[2] Note that only this specific combination of components is vulnerable. If the shell used is not bash, there is no problem. For example, dash is the default on newer Debian and Ubuntu systems. These systems may still be vulnerable if the user under which the server is running uses bash instead of the system shell, so the threat is still very real.

22. 09. 2014.

FSec 2014 - I can haz your board with JTAG

fsec2014-jtag.jpg

Last week I had pleasure of attending FSec 2014, annual security conference. Just like last year, I had hardware presentation, this time about reverse engineering NComputing CPLD dongle. You can find it on http://bit.ly/fsec2014-jtag or embedded below.

I had great time at conference, but I'm somewhat wondering did audience got something from my lecture. It was very interesting for me to figure out JTAG pinout on this board, and connect it to various JTAG programmers (all with their's good and bad sides) and I noticed that there are not any introductory text on the web how to approach this problem for the first time. So, I decided to present this topic in hope that this will motivate other people to take a hack at some board which would otherwise end up on e-waste of even worse, land-fill. And, who can resist call of free hardware which you can re-purpose? :-)

07. 09. 2014.

OpenHantek patch for voltage minumum and maximum

hantek-dso-2090.jpg

I have been using Hantek DSO-2090 USB oscilloscope for more than half a year now. While scope purist will say that usb oscilloscopes are not good enough for serious use for my use it's quite sufficient. However, this weekend, I was reverse engineering CPLD with R2R digital to analog converter, and I needed to figure out which steps are produced by turning pins on CPLD on or off. Sure, I can use multi-meter to do this, but if I already have oscilloscope it's much more powerful tool for task like this.

When choosing USB oscilloscope, I searched a lot, and decided to buy Hantek DSO-2090 because it's supported by free software like OpenHantek and sigrok. There are better oscilloscopes out there, but this one is supported by free software, and there is even a detailed tear-down which explains how to increase it's performance. When scope arrived, I was quite pleased with OpenHantek, but never managed to get sigrok working with it. It didn't matter at the time, since OpenHantek had everything I needed. However, for this task at hand I really needed minimum and maximum voltage. As you can see in video describing oscilloscope usage, and especially Hantek DSO-2090, including it's limits.

openhantek.png

OpenHantek shows just amplitude of signal, which is difference between minimal and maximal voltage but doesn't show raw values which I needed. So, I wrote simple patch to OpenHantek to display minimum, amplitude and maximum voltage as you can see in picture. I also wrote a message on mailing list with a patch, so I hope you might expect to see this change in next version of OpenHantek.

27. 08. 2014.

E-mail infrastructure you can blog about

The "e" in eCryptfs stands for "enterprise". Interestingly in the enterprise I'm in its uses were few and far apart. I built a lot of e-mail infrastructure this year. In fact it's almost all I've been doing, and "boring old e-mail" is nothing interesting to tell your friends about. With inclusion of eCryptfs and some other bits and pieces I think it may be something worth looking at, but first to do an infrastructure design overview.

I'm not an e-mail infrastructure architect (even if we make up that term for a moment), or in other words I'm not an expert in MS Exchange, IBM Domino and some other "collaborative software", and most importantly I'm not an expert in all the laws and legal issues related to E-mail in major countries. I consult with legal departments, and so should you. Your infrastructure designs are always going to be driven by corporate e-mail policies and local law - which can, for example, require from you to archive mail for a period of 7-10 years, and do so while conforming with data protection legislation... and that makes a big difference on your infrastructure. I recommend this overview of the "Climategate" case as a good cautionary tale. With that said I now feel comfortable describing infrastructure ideas someone may end up borrowing from one day.

E-mail is critical for most business today. Wait, that sounds like a stupid generalization. As a fact I can say this for types of businesses I've been working with; managed service providers and media production companies. They all operate with teams around the world and losing their e-mail system severely degrades their ability to get work done. That is why:

The system must be highly-available and fault-tolerant


Before I go on to the pretty pictures I have to note that good network design and engineering I am taking as a given here. The network has to be redundant well in advance of services. Network engineers I worked with were very good at their jobs and I had it easy, inheriting good infrastructure.

The first layer deployed on the network is the MX frontend. If you already have, or rent, an HA frontend that can sustain abuse traffic it's an easy choice to pull mail through it too. But your mileage may vary, as it's not trivial to proxy SMTP for a SPAM filter. If the filter sees connections only from the LB cluster it would be impossible for it to perform well; no rate limiting, no reputation scoring... I prefer HAProxy. People making it are great software engineers and their software and services are superior to anything else I've used (it's true I consulted for them once as a sysadmin but that has nothing to do with my endorsements). The HAProxy PROXY protocol, or TPROXY mode can be used in some cases. Or if you are a Barracuda Networks customer instead you might have their load balancers which are supposed to integrate with their SPAM firewalls, but I've been unable to find a single implementation detail to verify their claim. Without load balancers using the SPAM filtering cluster as the MX, and load balancing across it with round-robin DNS is a common deployment:

Network diagram



I wouldn't say much about the SPAM filter, obviously it's supposed to do a very good job at rating and scanning incoming mail, and everyone has their favorites. My own favorite classifier component for many years has been the crm114 discriminator, but you can't expect from (many) people to train their own filters and that it takes 3-6 months to achieve >99% accuracy, Gmail has spoiled the world. The important thing in the context of the diagram above is that the SPAM filter needs to be redundant, and that it must have the capability to spool incoming mail if all the Mailstore backends fail.

The system must have backups and DR fail-over strategy


For building the backend, the "Mailstores", some of my favorites are Postfix, sometimes Qmail, and Dovecot. It's not relevant, but I guess someone would want to hear that too.

eCryptfs (stacked) file-system runs on top of the storage file-system, and all the mailboxes and spools are stored on it. The reasons for using it are not just related to data protection legislation. There are other solutions and faster too, block-level or hardware-based solutions for doing full disk encryption. But, being a file-system eCryptfs allows us to manipulate mail on the individual (mail) file or (mailbox) directory level. Encrypted mail can be transferred over the network to the remote backup backend very efficiently because of it. If you require, or are allowed to do, snapshots they don't necessarily have to be done at the (fancy) file-system or volume level. Common ext4/xfs and a little rsync hard-links magic work just as well (up to about 1TB on cheap slow drives).

When doing backup restores or a backend fail-over eCryptfs keys can be inserted into the kernel keyring, and data mounted on the remote file-system to take over.

The system must be secure


Everyone has their IPS and IDS favorites, and implementations. But those, together with firewalls, application firewalls, virtual private networks, access controls, two-factor authentication and file-system encryption... still do not make your private and confidential data safe. E-mail is not confidential as SMTP is a plain-text protocol. I personally think of it as being in the public domain. The solution to authenticating correspondents and to protecting your data and intellectual property of your company, both in transit and stored on the Mailstore, is PGP/GPG encryption. It is essential.

Even then, confidential data and attachments from mailboxes of employees will find their way onto your project management suite, bug tracker, wiki... But that is another topic entirely. Thanks for reading.

13. 08. 2014.

Kupnja stana: neverending story

This is an old article from my Croatian blog. It would lose much in the translation, so it is reposted as-is. To spare you the effort of learning Croatian: it chronicles my adventures in trying to purchase and furnish an apartment, in a manner similar to Kafka's The Trial, except there's a happy end and I'm not a literary genius.

Pred kraj 2007. godine započeo sam proces kupnje stana, negdje u 6. mjesecu 2009. su počele radnje vezanje uz preuzimanje i namještanje, da bi se većina stvari uspješno završila pred kraj 9. mjeseca.

Da počnem od početka. Kod kupnje nekakve nekretnine kod nas, osim ako imate nasljedstvo ili se bavite sumnjivim poslovima (ili ste napravili uspješan exit svog startupa!), korak broj jedan je dobiti nekakvo kreditiranje. Pred te dvije godine situacija sa dobijanjem kredita je bila mnogo ... ne jednostavnija, ali s većom vjerojatnosti da ćete kredit moći i dobiti ukoliko imate uvjete za njega.

A naklonost banaka prema osobama koje traže kredit je usko vezana uz tip firme u kojem radite:

Prvo su me tražili dokaz o primanjima obrta, pa su zaključili da pošto je obrt dokaz o primanjima ne vrijedi ništa, tako da sam morao imati i sudužnike koji pokrivaju cijeli kredit i hipotetsko osiguranje. Hvala bogu da je zgrada (novogradnja) u kojoj sam kupovao stan bila financirana od strane iste banke, pa su uzeli budući stan pod hipoteku.

Eh da, tako je to bilo tada, jednostavno. Čujem da je sad puno, puno teže, a da hipoteke moraju pokrivati puno veći iznos (od onog koji se diže).

Rješivši financijski dio priče (ukoliko se obavezivanje na poprilično veliku ratu u slijedećih X godina može nazvati rješavanjem financijskog dijela priče), preostalo je samo ugodno čekanje da se zgrada dovrši. Naravno, računao sam sa “Faktorom H” i pretpostavio da će kasniti par mjeseci. Negdje s početkom godine krenuli smo i u lagano traženje namještaja i stvari za stan, s idejom da to kupimo taman negdje mjesec dana prije kompenziranog datuma useljenja, jer i namještaju treba neko vrijeme da dođe do nas.

Kako Hofstadter i kaže, stvar se još više oduljila, za jedno mjesec-dva. Ono što je u toj priči bilo najgore je da su i graditelji i banka znali da se stvar odužuje ali nitko nije želio priznati do zadnjeg trena, što znači da svoju taktiku nismo uspjeli prilagoditi. Rezultat: gomila namještaja u raznim skladištima i telefonsko izgovaranje kako stan još nije spreman. :(

Banka je tu opet posebna priča. Kredit koji su mi odobrili je na tzv. “tranše”. Laički rečeno, daju vam kredit ali vam ne daju novce :), odnosno ne sve odjednom, nego po fazama projekta. Btw, znate ono kad vam banka šalje prijeteće pismo ako zaboravite podmiriti nešto na vrijeme? E moja banka je zaboravila meni (tj graditelju) uplatiti tranšu. Anyhow, zadnja tranša je išla po dovršenju projekta. Kako je projekt kasnio, to je bilo negdje u 6. (umjesto u 3.) mjesecu, a stvar je išla ovako:

Eventually se kredit realizirao u potpunosti i tu naša priča s kupovinom završava, ali gdje završava jedna, počinje druga. Daklem, namještaj i unutarnje uređenje:

Zanimljiv detalj u cijeloj priči je da svi majstori / reklamacije rade od 9 do 5. Što znači za sve popravke treba trčati do stana pričekati majstore. To u kombinaciji sa činjenicom da obavezno kasne (ako imate sreće, kašnjenje se mjeri u satima a ne danima) znači da morate imati ili vrlo tolerantne šefove ili iskoristiti dio godišnjeg.

Paralelno sa useljenjem i radovima rješavamo i vezu na Internet. Tu nam se ne žuri jer imamo dovoljan pristup ‘netu na poslu. Ja sam na vrijeme iznajmio ured za svoju tvrtku pa imam opremljeno mjesto za rad. Ali nakon radnog dana mi se ne da još i tweetati i bloggati.

Za pristup Internetu imamo tri opcije: T-Com, neki drugi operater preko telefonske parice, ili B-Net. B-Net već ima provučene instalacije po zgradi i samo se u razdjelnom ormariću napravi prespajanje. Osim toga, za osnovni telka/telefon/net paket su i najjeftiniji. Stoga su nam oni bili prvi izbor. Kako smo kupili full HD televizor, želimo digitalni signal, pa se raspitujemo o digitalnim paketima – ima ih uz nadoplatu. Digitalni prijamnik ima samo SCART izlaz. Osim ako uzmemo HD paket koji se sastoji od 2 screensavera i Nove TV. Ne hvala.

Druga opcija je Iskon. Preferiramo ga T-Comu jer su jeftiniji, a osim toga su i manji pa im je zapravo stalo do običnih korisnika. Iskon mora prvo od T-Coma zatražiti paricu za nas. Nažalost, od T-Coma dobija odbijenicu uz razlog da nema slobodnih parica. Kako Iskon nije zakonski obavezan davati telefonsku infrastrukturu, ne mogu ništa.

Za razliku od njih, T-Com je zakonski obavezan dati telefonsku infrastrukturu. Stoga dajem zahtjev za uvođenje linije; bez ugovorne obveze, uz jednokratnu nadoknadu, s idejom odmah prelaska na Iskon. U roku od 24 mjeseca još uvijek će mi se isplatiti. Rok za rješavanje zahtjeva je 30 dana. Nakon mjesec dana zovem i dobijam poruku da je još uvijek u tijeku utvrđivanje tehničkih mogućnosti za uvođenje linije. Nakon još tjedan-dva svakodnevnog zvanja korisničke službe da saznam što se događa, storniraju mi zahtjev bez da mi jave. Korisnička služba kaže “valjda nije bilo tehničkih uvjeta za uvođenje”. Iz neslužbenih kanala saznajem trač da se T-Com svađa s nekim oko nekog zemljišta preko kojeg treba ići kabel.

Zaključujemo da nam se ne da čekati Godota pa uzimamo B-Net. Uz samo dva tjedna čekanja nakon podnošenja zahtjeva, dobijamo liniju i vraćamo se u 21. stoljeće.

E sad, cijela ova priča može zvučati kao da smo imali jako lošu sreću. Najgora stvar u cijeloj priči je da zapravo nije tako – relativno smo dobro prošli. Kredit smo uzeli u respektabilnoj banci; kamata, iako fleksibilna, još nije porasla. Graditelj je jedan od najkvalitetnijih u Hrvatskoj; nama recimo nije otpao balkon, uselili smo samo 3.5 mjeseca nakon roka, nemamo otrovne vodovodne cijevi, nije nam pao strop na glavu, a naš stan nije prodan još nekolicini kupaca. Nažalost, ovo je svakodnevica kupovine nekretnina u Hrvatskoj.

Sa majstorima i namještajem smo također dobro prošli – nitko nije zbrisao nakon uplaćene pozamašne kapare. Majstori su svoje mnogobrojne greške došli ispraviti, iako je počesto trebalo previše vikanja da se stvar obavi. Zadovoljni smo sa namještajem i sad kad su se stvari većinom smirile i završile, možemo uživati u svom novom stanu :) I plaćati taj kredit…

Dragi štioče, ako si dogurao do ovdje u čitanju teksta, svaka ti čast! Kao nagrada, evo par savjeta za kraj:

18. 07. 2014.

Learning Go

For the past few weeks I've been looking into Go [0]. It's a rather new language, backed by Google and it seems to have gained a fair amount (relative to its age) of adoption from developers.

These day I'm coding primarily in Python. Apparently, most people switching to Go are users of Python, Ruby, and similar languages. So, naturally, there's a lot of comparison made between these languages: for example, Go is considered by some to be as expressive as Python, but compiled down to native code so it's faster in execution, and has way better concurrency support.

But I have a different comparison to make - to C. Before Python, I was a C programmer [1], and I've actually spent more years coding in C than in Python. While learning Go, I compared it not only to the high-level dynamic Python language, but also to the low-level "portable assembler" language.

For the types of applications I used C for (desktop apps, command line tools, network services - nothing touching raw hardware), Go easily beats C. It seems as if someone sat down, listed all the problems with C that occur in practice, and then designed a language without those problems. In fact, considering who the principal mind behind Go is, that may not be far from the truth.

A few examples:

I've only listed some of the improvements that can be directly compared with C, without touching features like channels and interfaces, which don't have direct counterparts in the land of C.

It is true that from a purely academic perspective, you may not find Go a very interesting language (if you're not into the whole concurrency thing). But from a C developer's point of view, it's a dream come true.

[0] I'm not switching to Go or ditching Python - I'm merely learning a new language
[1] Where I say “C”, I really do mean “C” - not “C and C++”

22. 06. 2014.

DORS/CLUC 2014 conference

IMG_20140616_091523.jpg

Every year, our three day annual DORS/CLUC 2014 conference is happening. This year, the dates shifted a few weeks later, which resulted in less students showing up because of exams, so it was a somewhat different experience than years before. For few years now we are not at the University of Zagreb, FER location so it also changed conference a bit. Having said that, even after move from FER, we still had a bus of students from my own faculty FOI in Varaždin, and they where missing this year.

It was still full conference in new (2nd floor, not ideal for breaks in fresh air which is a must to stay for 11 hours each day, mind you) location at Croatian Chamber of Economy new and nice conference hall with wifi which was stable but didn't allow UDP traffic. Both mosh and n2n didn't work for me.

It was also in very different format. I would love to know did it worked for people or not. Instead of charging for workshops, they where included in conference price, and as every year, it you where interested in topic, nobody will turn you away from workshop because of space :-) This also meant that workshops are three hours slots at the end of the day after 7 hours of lectures. When conference started, we where afraid how will we accommodate all that people at workshops, but sense prevailed and about 20 or so people stayed for workshop each day.

Parallella and Epiphany 16 core mesh CPU

presentation

I had 5-minute lightning talk about Parallella, and hopefully managed to explain, that there is now interesting dual-core ARM, with interesting DSP-like capabilities backed by OpenCL and FPGA. This is unique combination of processing power, and it would be interesting to see which part of this machine can run OpenVPN encryption best for example, because it has 1Gbit/s ethernet interface.

ZFS workshop, updated to 0.6.3

presentation

ZFS on Linux had a 0.6.3 release just in time, and I presented two and half hour long workshop about ZFS for which 10-20 people stayed, after 7 ours of presentations. I somewhat field to show enough in command-line, I'm afraid, because I was typing too little. I did managed to show what will you get if you re-purpose several year old hardware for ZFS storage. Something along lines of 2004 year hardware with 8 SCSI disks.

I managed to create raid-10 like setup, but with all benefits of ZFS, fill it up and scrub it during workshop.

root@debian:/workshop# zfs list
NAME                  USED  AVAIL  REFER  MOUNTPOINT
workshop              268G    28K   268G  /workshop
workshop/test1        280K    28K   144K  /workshop/test1
workshop/test1/sub1   136K    28K   136K  /workshop/test1/sub1
root@debian:/workshop# zpool status
  pool: workshop
 state: ONLINE
  scan: scrub repaired 0 in 0h44m with 0 errors on Tue Jun 17 17:30:38 2014
config:

        NAME                                      STATE     READ WRITE CKSUM
        workshop                                  ONLINE       0     0     0
          mirror-0                                ONLINE       0     0     0
            scsi-SFUJITSU_MAS3735NC_A107P4B02KAT  ONLINE       0     0     0
            scsi-SFUJITSU_MAS3735NC_A107P4B02KBB  ONLINE       0     0     0
          mirror-1                                ONLINE       0     0     0
            scsi-SFUJITSU_MAS3735NC_A107P4B02KCK  ONLINE       0     0     0
            scsi-SFUJITSU_MAS3735NC_A107P4B02KDD  ONLINE       0     0     0
          mirror-2                                ONLINE       0     0     0
            scsi-SFUJITSU_MAS3735NC_A107P4B02L4S  ONLINE       0     0     0
            scsi-SFUJITSU_MAS3735NC_A107P4B02L4U  ONLINE       0     0     0
          mirror-3                                ONLINE       0     0     0
            scsi-SFUJITSU_MAW3073NC_DAL3P6C04079  ONLINE       0     0     0
            scsi-SFUJITSU_MAW3073NC_DAL3P6C040BM  ONLINE       0     0     0

errors: No known data errors
I think it might be good idea to pxeboot this machine on demand (for long-term archival storage) and copy snapshots to it on weekly basis for example. Think of it as tape alternative (quite small, 300G) but with rather fast random IO. Idea was to use this setup for ganeti-backup target, but dump format of ext file-system forced us to use zfs volumes to restore backup on other RAIDZ1 4*1.5T SATA pool, and it was very slow.
In current state, it can receive zfs snapshots at 30-40 MB/s and it's using single core for ssh, which is bottleneck. More benchmarks have to be done on this machine to see weather it's worth electricity it's using...

Ganeti - our own cloud

presentation

Another interesting part of infrastructure work last year for me was with Luka Blašković. We migrated all servers from faculty and library to two Ganeti groups. We are running cluster of reasonable size (10+ nodes, 70+ instances). Everything we did is done from legacy hardware which is now much better utilized. Some machines where never backuped and firmware upgraded so it was first time for them to have this kind of maintenance in last 10 years. Now we can move VM instances to another machine, and we are much more confident that services will stay running via live migration for scheduled maintenance or restart in case of hardware failure.

For workshop, we decided to chew a bit more than we can swallow. We spun up KVM images on our ganeti cluster and went through installation of workshop ganeti on them and joining them to new cluster. This went fairly well, but then we started configuring xen to spawn new instances (ganeti kvm with ganeti xen on top of it) we had some problems with memory limits which we managed to fix before end of workshop.
In our defense, we really believe that workshop was more interesting this way, probably because people didn't want to leave (few brave ones which where with us all the way to the end, that is). When you try to deploy something as complex as Ganeti you will run into some problems, so seeing troubleshooting methods used is usually as helpful as solution itself.

All in all, it was interesting and very involved three days. Hope to see you all again next year.

29. 05. 2014.

parallella - first week with a supercomputer

IMG_20140508_110839.jpg

After 18 months Parallella kickstarter project delivered and I got the device in my hands. To be honest, I was prepared to write off $100 for it, but decided to support the project because I believe that we should have alternative architectures developed and Epiphany had such a goal.

As you can see on the picture, I got parallella board and heatsink for FPGA in nice box together with the pack slip. Heatsink is recent addition because the FPGA get very hot. However, it's not enough because you will also need some air flow over it to ensure stable operation. And 5V 2A power supply. So, I decided to do some research before the first power-on because burning board on the first try is not a good option.

Here is where Parallella forums came in very handy. It's full of very supportive community, and to learn how to use your board it's better place than the official documentation (and more up-to-date). On it you can learn that there are jumpers on the board to provide 5V for fan, and various other hints about the platform including ability to power the board over USB connector which proved helpful since I could use a 2A Nexus power supply.

Official image for Parallella is based on Ubuntu (which I don't like much, it even doesn't move devtmpfs by default), so I opted to install the unsupported Debian installation and try to lower power usage by disabling HDMI support since I'm not using it. And thanks to helpful parallella community and the forum post about with alternative parallella bitstreams and device tree I was successful in that task lowering power draw to ~0.75 A in idle mode and ~0.86 A while testing with aobench from parallella-examples. CPU load alone (two arm cores) seem to consume ~0.81 A. For comparison, HDMI bitstream consumes ~1.03 A in idle and ~1.19 A under load. All values are maximal ones which I measured using USB charger doctor, so they might not be the most precise.

IMG_20140525_130254.jpg

To cool the device, I have salvaged small fan from an old disk drawer and attached it to the board using zip ties.

Power is supplied from a USB port on PC (for now), but the next logical step is to connect it to the jumpers on board and print the case for it on 3D printer. This involves mocking up with 3D software to the design case, so it might take some time. However, so far I'm very happy with my new toy.

18. 05. 2014.

Type checking in Python

One of the defining properties in Python is its dynamic type system. This is both a blessing and a curse. The benefits are probably obvious to every Python programmer.

One downside is that it lets through a class of simple, but very easy to make, errors, that could be caught easily by the type system. In languages such as Python, these errors easily slip through without a good automated test coverage system.

Another downside is that specifying types directly can help with readability of the code, and is especially useful in documenting an API (be it an external library or an internal component). In Python, for example, the standard practice is to document the types (and meaning) of function arguments and return values in a docstring in a special Sphinx-recognized syntax. So we do have to spell out the types manually, anways, but that's of no use to the interpreter!

This is recognized as a problem to the extent that there are several Python packages attempting to solve it (typecheck-decorator, typecheck3, typechecker, typeannotations, with the most active one appearing to be PyContracts), and there's even a Python3 PEP designed to help with it: PEP-3107 (although it is general enough that it can be used for other purposes as well, this was one of the primary concerns). In fact, Guido van Rossum posted a series of articles on that very topic way back in 2004 and 2005 (Adding optional static typing to Python, part1, part2, part3, redux).

Since the topic is interesting to me, and this being a series of programming experiments, I decided to implement my own solution to this problem. Although the main motivation was to have fun, I believe the solution might actually be useful in the real world, and that it has some benefits over existing ones: expresivness, clean, readable syntax, and Python 2 support.

Here's how it looks: this snippet defines a function taking two integers, adding them, and returning their result, which is also an integer:

@returns(int)
@params(a=int, b=int)
def add(a, b):
    return a + b

Simple, right? Here's a little more complex one:

class MyObject(object):
    name = ...

@returns({str: [MyObject]})
@params(objs=[MyObject]):
def group_by_name(objs):
    groups = defaultdict(list)
    for obj in objs:
        groups[obj.name].append(obj)
    return groups

Pretty readable, eh?

The type signatures can be arbitrarily complex so it can support the majority of use cases in the real world. The major missing part is support for union types, for arguments which can be of a few distinct types (often, the actual value type and None representing the default value). In these cases, you need to use object, which matches any type.

Since the behaviour doesn't rely on Python 3 annotations, Python 2 is supported as well (in fact, it works on any version of Python from 2.5 onwards).

Another feature I added is logging support and ability to enable or disable the type checks at runtime. This is useful when running code in production, in which you don't neccessarily want to crash the application due to the type check assertion, but you probably want to log the occurrence happening.

Here's an example of a log created when calling the above add function incorrectly:

ERROR:typedecorator:File "example.py", line 11, in some_caller: argument a = 'a' doesn't match signature int: add('a', 1)

The code for all of this is stable, tested, published on GitHub and available from PyPI. If you want to play with it, head on to typedecorator repository on GitHub for more docs. If you do try it out, I'd love to hear your comments and suggestions.

I have some ideas about additional stuff that could go into it, which I'll probably cover in some future installment of the programming experiments series, so stay tuned!

07. 05. 2014.

New draft 2

Testing subscribe out again.

New draft 2fff

Silvrback blog image

dfasdf fsdf dskfj sdkjhf jfaskjfkdjfh dskjfjfdjskfsd

30. 04. 2014.

Maybe in Python

This post talks about a neat trick for simplifying program flow in Python. If you know Haskell, you'll recognize it as the Maybe monad. If you're more of a Scala or OCaml type of person, it's an Option. If OOP and design patterns rock your boat, it looks eerily like the Null Object Pattern.

Here's a problem to start with: imagine you have a function that deals with several variables. For example, it might do a calculation or perform some I/O based on the variables. One (or more) of them may be not supplied, not valid, unknown, or have to be similarly special-cased.

The naive code might look like:

foo = get_foo() # may return None if we can't get 'foo'
foo_squared = foo * foo
bar = ... # doesn't depend on foo
baz = foo_squared * bar
print foo, bar, baz

This doesn't handle the fact that foo might not be known (ie. have the value of None here), in case the program will happily crash.

No worries, we'll just add checks where appropriate, right?

foo = get_foo() # may return None if we can't get 'foo'

if foo is None:
    foo_squared = None
else:
    foo_squared = foo * foo

bar = ... # doesn't depend on foo

if foo_squared is None:
    baz = None
else:
    baz = foo_squared * bar

print foo, bar, baz

This works correctly (unless I made a mistake), but is ugly and the actual calculation we tried to do is hidden between the special-case checks. In this small example, the calculation can be reordered to simplify it a bit - finding more complex examples of the same problem in real-world code is left as an exercise for the reader.

Instead, let's define something called Maybe, that can be either Nothing (which means, there's no value of interest), or Just(value) if it does hold a useful value. Further more, let's define that any operation that involves a Nothing immediately results in Nothing. Operations that involve a Just(value) will compute the result as usual, but then additionally wrap it again in Just, to produce Just(value).

The above function then looks something like:

foo = maybe_get_foo()  # returns  Just(<value>) or Nothing
foo_squared = foo * foo
bar = ... # doesn't depend on foo
baz = foo_squared * bar
print foo, bar, baz

Much better.

How hard it is to define such a construct in Python? As it turns out, not that hard. Here's a complete implementation, with documentation, tests and a license, in less than 250 lines - maybe.py. It doesn't cover all the operators possible (patches welcome), but it does cover most of the usual suspects.

Functional programming aficionados will probably balk both at the implementation and the usage. There's objects, operator overloading, metaclasses and magic mocking of attributes and function calls. And stuff like this works:

>>> Just('hello')[:-1].upper()
Just('HELL')
>>> Just(Nothing)[:-1].upper()
Nothing

Is it really a monad, then? Yes, it is - the relevant axioms hold (see the docstrings). However, it doesn't try to shoehorn Lisp, Haskell or Scala syntax into Python (if you're into that, fn.py might be of interest). It uses Python's strengths instead of awkwardly stepping around its "not really a functional programming language" limitations.

And that's why it was fun to write.

20. 04. 2014.

Perfect Fedora Desktop in 5 easy steps

 
Fedora is an awesome distro, but it lacks a bit polish to be usable work and pleasure desktop out of the box. Follow these 5 easy steps to make a perfect Fedora desktop. Please also share how you make your Fedora install perfect.
 
1. First step is updating the whole system, and is the one I hate the most, so let’s just get over with it…
 
sudo dnf update -y
 

2. Now let’s install Fedy tool that lets you tweak lot’s of additional things fast, streamlines installation of software and tweak and is really simple to use:
 
su -c "curl http://satya164.github.io/fedy/fedy-installer -o fedy-installer && chmod +x fedy-installer && ./fedy-installer"
 

3. Fedy has also a nice gui, but once you get to know what it can do it is faster to do all things via command line:
 
sudo fedy --exec sublime_text3 touchpad_tap rpmfusion_repos media_codecs skype_linux tor_browser adobe_flash nautilus_dropbox teamviewer_linux
 

4. Now let’s install some additiona goodies, and use dnf tool instead yum because it is much faster:
 
sudo dnf install synapse faience-icon-theme clipit vlc qbitorrent krusader filelight k3b-extras-freeworld redshift-gtk htop lm_sensors filezilla @cinnamon-desktop xchat pidgin gnome-tweak-tool
 

5. And best for last, compile and install tilda which is just best terminal ever:
 

sudo dnf install git automake libconfuse-devel vte3-devel gtk3-devel glib-devel gettext-devel gcc
git clone https://github.com/lanoxx/tilda.git
cd tilda/
./autogen.sh --prefix=/usr
make --silent
sudo make install

 

That is is, just don’t forget to switch to Cinnamon as your default desktop next time you login and change icons to Faience icons. Enjoy your perfect Fedora desktop!
 

12. 04. 2014.

Fixing Debian depenencies using fake package

Few days ago, I noticed odd problem with koha-common package. It depends on mysql-client which on squeeze tries to install version 5.1 which conflicts with my installation which uses Percona MySQL build. How can we fix this?

As it turns out, it rather easy. I will just create fake package which will provide mysql-client and in turn depend on percona-server-client using something like:

koha-dev:/srv# cat mysql-client-fake/DEBIAN/control 
Package: mysql-client-fake
Version: 0.0.1
Section: database
Priority: optional
Architecture: all
Depends: percona-server-client
Provides: mysql-client
Suggests:
Conflicts:
Maintainer: Dobrica Pavlinusic <dpavlin@rot13.org>
Description: Provides mysql-client for percona build
koha-dev:/srv# dpkg-deb -b mysql-client-fake .
dpkg-deb: building package `mysql-client-fake' in `./mysql-client-fake_0.0.1_all.deb'.
koha-dev:/srv# dpkg -i mysql-client-fake_0.0.1_all.deb 
(Reading database ... 59348 files and directories currently installed.)
Preparing to replace mysql-client-fake 0.0.1 (using mysql-client-fake_0.0.1_all.deb) ...
Unpacking replacement mysql-client-fake ...
Setting up mysql-client-fake (0.0.1) ...
Quick and easy. Before you start bashing Debian for this, have in mind that both Koha and Percona MySQL builds are not official Debian packages, so it's not really Debian developers problem.

Update: This problem occurs because Debian developers decided to use virtual-mysql-server and virtual-mysql-client Provides so Percona changed it's provides to virtual-mysql-server but Koha package requires older mysql-client.

18. 03. 2014.

Linux Mint sets-up wrong default PDF viewer and folder launchers

 
Linux Mint has issue with some default apps that are launched from Firefox and Chrome browsers. For example instead of PDF viewer GIMP is lauched as PDF viewer. After investigating this issue looks like it is a common issue for lots of people who are using Mate version of Linux Mint. Probably some updates are to blame.
 
As usual best place to find good info is Arch Wiki which has some great info about setting default app launchers.
 
Issues for me was opening PDF files and directories from Firefox and Chrome. To check current default apps ‘xdg-mime’ is used:
xdg-mime query default inode/directory
xdg-mime query default application/pdf

 
and a quick fix for my two issues was:
xdg-mime default atril.desktop application/pdf
xdg-mime default caja.desktop inode/directory

 
and now just to test if new launchers work as you expected:
xdg-open ~/Desktop/
xdg-open ~Downloads/Demo.pdf

 

ps. Ask Fedora has really informative page regarding default apps on Fedora.
 

16. 03. 2014.

Building custom OpenWRT image for home router

Finally I decided to upgrade my wireless network to 802.11n, and to do so I picked up cheap TP-Link TL-WR740N and decided to install OpenVPN, n2n and munin node on it. This is where the problems started because simple opkg install openvpn filled up whole file-system. Instead of declaring fail on this front, I decided to ask a friend how to make this work...

Reason for this upgrade was change in my router provided by ADSL provider. I didn't have any administration privileges on it, and it was only 802.11g device, so my previous configuration with igel which provided pppoe wasn't possible any more (since I can't turn ADSL router into bridge mode). So I decided to scrap igel and move openvpn and n2n to TP-Link instead (which will also help with head dissipation on my closet which hosts all those devices).

Since router has just 4MiB of flash storage, installing large packages is not solution for this platform. However, all is not lost, and there is alternative way to make this work. Trick is in way how OpenWRT uses flash storage. Image which you download from internet contains squashfs (which is compressed) that enable really efficient usage of storage on router itself. All additional packages are installed into overlay file-system, which doesn't support compression so you will fill root file-system really quick. However, there is solution. OpenWrt project provides Image Builder which enables you to select packages which are included in base installation, and thus ends up in squash file-system nicely reducing need for flash storage. Even better, you can also exclude packages which you are not going to use. However, to make this really useful you also have to provide files directory which contains modifications needed to make your specific router configuration work (like IP addresses, OpenVPN keys, n2n keys and similar modification).

First, I downloaded OpenWrt Barrier Breaker (Bleeding Edge Snapshots) and created files directory in which I will create files which are specific for my setup. For a first build (to make sure that it works I just copied /etc/config/network into it and rebuild image with

make image PROFILE=TLWR740 PACKAGES="-dnsmasq -ip6tables -ppp \
 -ppp-mod-pppoe -kmod-ipt-nathelper -odhcp6c \
 openvpn-openssl n2n muninlite" FILES=../files/
I didn't need dnsmasq (because ADSL modem will provide DHCP service for my network) and along the same lines, I also excluded ppp and nat but added openssl, n2n and muninlite (which is munin node written in C).
After rebuild, I copied created image to router and started upgrade with
scp bin/ar71xx/openwrt-ar71xx-generic-tl-wr740n-v4-squashfs-sysupgrade.bin root@192.168.1.2:/tmp/
ssh root@192.168.1.2 sysupgrade -v /tmp/openwrt-ar71xx-generic-tl-wr740n-v4-squashfs-sysupgrade.bin
Than I hold my breath and after re-flashing router it rebooted and connected to my network. So far, so good. Now I had all required packages installed, so I started configuring packages to my specific need. In the end, I had following configuration files which I copied back to my files folder
dpavlin@t61p:~/openwrt$ find files/
files/
files/etc
files/etc/config
files/etc/config/system
files/etc/config/network
files/etc/config/wireless
files/etc/config/openvpn
files/etc/config/n2n
files/etc/openvpn
files/etc/openvpn/tap_home.conf
files/etc/openvpn/tap_home.sh
files/etc/openvpn/prod.key
files/etc/init.d
files/etc/init.d/openvpn
files/etc/dropbear
files/etc/dropbear/authorized_keys

After another rebuild of image to make sure that everything works, I was all set with new router for my home network.

06. 03. 2014.

A tale of false alarm by ConfigServer, CPanel and a hosting provider.


I'm responsible for a couple of CPanel/WHM managed dedicated servers.

We  keep them updated, and try to do as little customization as possible outside of what cPanel knows about. We enabled mod_proxy_fcgi and PHP-FPM, so we can use Apache 2.4 MPM Event for our fairly high traffic web site. It's a unfortunate that CPanel doesn't have this configuration available out of the box, but that's for another blog post.

Today early in the morning we got a message from our lfd daemon (a service installed by a free ConfigServer Security & Firewall CPanel plugin installed by our hosting provider):

The following list of files have FAILED the md5sum comparison test. This means that the file has been changed in some way. This could be a result of an OS update or application upgrade. If the change is unexpected it should be investigated:
/usr/bin/ghostscript: FAILED
/usr/bin/gs: FAILED

The funny thing is, nothing upgraded any RPM files in this time window, our /var/log/yum.log didn't mention any upgrades to ghostscript package that provides the /usr/bin/gs binary (/usr/bin/ghostscript is a symlink to gs), we have disabled automatic updates that can be initiated by the cpanel upcp --cron sciprt, but the system us regulagrly kept up to date manually with yum update.

I've reinstalled the package with yum reinstall ghostscript (ghostscript-8.70-19.el6.x86_64 was reinstalled)

and the binary size and md5sum changed like this:

before:
size: 19152 bytes
md5sum: c64b5016d94450b476148c31cfef61ff

after reinstall:
size: 6760 bytes
md5sum: 73db43e258c4b191757b7ba75a883321

This is what actually happened: Our managed hosting provider had apparently changed our setup to upgrade our system packages automatically (probably with best intentions due to recent gnutls issue). And prelinking seems to be enabled on our system, so when upcp (CPanel automatic upgrade cron script that runs periodically) executed /usr/local/cpanel/scripts/rpmup to upgrade system packages, it also did the prelinking step, adding extra prelinking stuff to our /usr/bin/gs binary.

Similar issue described here:

http://linsec.ca/blog/2012/01/23/rpm-v-and-prelinked-binaries/


02. 03. 2014.

True problems of software development

I'm halfway through Patterns of Software, a collection of essays by Richard Gabriel (one of creators of Common Lisp). The book approaches problems in software development from a philosophical standpoint and is heavily influenced by works of Christopher Alexander, an architect that started the entire Design Patterns movement.

As a lead of a software development consultancy, I'm in daily contact with people who find it hard to grasp why a software project can be hard to plan, deadlines and cost hard to estimate, even for experienced developers. In Richard's book I found an excellent explanation:

The true problems of software development derive from the way the organization can discover and come to grips with the complexity of the system being built while maintaining budget and schedule constraints.

He then goes on to explain:

It is not common for organizations to try to put together a novel large artifact, let alone doing it on schedule. When an engineering team designs and builds a bridge, for example, it is creating a variant of a well-known design, and so many things about that design are already known that the accuracy of planning and scheduling depends on how hard the people want to work, not on whether they can figure out how to do it.

This matches my experience well. Any non-trivial software development project is largely a research project as well. If it weren't, it'd already be available as an existing off-the-shelf solution.

The entire book is a great treatise on software development and software quality and I heartily recommend it to anyone interesting in thinking about software, design patterns and code quality. The book is freely available online, in PDF format.

True problems of software development

I'm halfway through Patterns of Software, a collection of essays by Richard Gabriel (one of creators of Common Lisp). The book approaches problems in software development from a philosophical standpoint and is heavily influenced by works of Christopher Alexander, an architect that started the entire Design Patterns movement.

As a lead of a software development consultancy, I'm in daily contact with people who find it hard to grasp why a software project can be hard to plan, deadlines and cost hard to estimate, even for experienced developers. In Richard's book I found an excellent explanation:

The true problems of software development derive from the way the organization can discover and come to grips with the complexity of the system being built while maintaining budget and schedule constraints.

He then goes on to explain:

It is not common for organizations to try to put together a novel large artifact, let alone doing it on schedule. When an engineering team designs and builds a bridge, for example, it is creating a variant of a well-known design, and so many things about that design are already known that the accuracy of planning and scheduling depends on how hard the people want to work, not on whether they can figure out how to do it.

This matches my experience well. Any non-trivial software development project is largely a research project as well. If it weren't, it'd already be available as an existing off-the-shelf solution.

The entire book is a great treatise on software development and software quality and I heartily recommend it to anyone interesting in thinking about software, design patterns and code quality. The book is freely available online, in PDF format.

17. 02. 2014.

OpenVPN client on Raspberry Pi

 
This article is writen in spite of lots of blog posts on this topic, but most of them don’t take in account some best practices and have redundant and sometimes wrong information.
 
So if you wish to use your Raspberry Pi as OpenVPN client and make configure your Raspberry Pi the RightWay(tm) then you have come to the right place :)
 
First you need to have certificate files, if you are admin on the OpenVPN server also then you need to know how to create these files (not covered in this article) and if you are not then you should ask admin of OpenVPN server to send these files to you.
 
First file you need is Certificate Authority Certificate file usually named ca.crt, and two are client specific and unique for each client, for this example I’ll use raspberry.key and raspberry.crt
 
First install openvpn package:
sudo apt-get install openvpn
 
Now create config file for OpenVPN:
vi / etc/openvpn/client.conf
and use these settings:
client
dev tun
port 1194
proto udp

remote CHANGE-ME-SERVER 1194 # VPN server IP : PORT
nobind

ca / etc/openvpn/ca.crt
cert / etc/openvpn/raspberry.crt
key / etc/openvpn/raspberry.key

comp-lzo
persist-key
persist-tun

verb 3


 
Copy certificates and key to / etc/openvpn/ directory on your Raspberry Pi
 
Start OpenVPN service
sudo / etc/init.d/openvpn start
 
Trubleshooting
If OpenVPN service is not starting take a peek into your log file:
tail /var/log/daemon.log
 
External links:
  • OpenVPN on Debian WIKI
  • 17. 01. 2014.

    Adding RTC to Embedded devices running OpenWrt (part 1)

     
    We take real time clock (RTC) for granted, as all of our gadgets keep correct time even when we turn them off.
     
    Now imagine that you have to set correct time each time you power on any of your gadgets, that would be a nightmare, right? :)
     
    Most of smaller devices like routers we use don’t have RTC onboard, and as I have been using OpenWrt on lots of different devices I know that OpenWrt can use bitbangging to simulate i2c protocol on GPIO pins.
     
    With real time clock all of my smaller embedded gadgets would be one step closer to being independent as their bigger android and pc cousins.
     
    There are really cheap RTC devices on Ebay, I got one based on DS1307 chip and AT24C32 memory which uses i2c protocol for communication.
     

    First step is to get i2c working OpenWrt and then to connect these two devices.
    Installing necessary drivers and tools is the first step:

    opkg update
    opkg install kmod-i2c-gpio-custom i2c-tools

     
    Now choose which two GPIO pins you would like to use as i2c SDA and SCL pins. Choosing GPIO pins depends on which device you are using. Some devices has unused GPIO pins, but some don’t and you will have to remove LEDs and use those pins as GPIO pins for i2c communication. For this example I’ll create i2c bus0 with GPIO 3 and 4 pins as SDA and SCL pins.


    insmod i2c-dev
    insmod i2c-gpio-custom bus0=0,3,4

     

    Now test that bus is correctly initialized and search the bus for i2c devices connected to the bus.


    i2cdetect -l
    i2cdetect 0

     
    # i2cdetect -y 0
    0 1 2 3 4 5 6 7 8 9 a b c d e f
    00: -- -- -- -- -- -- -- -- -- -- -- -- --
    10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
    50: -- -- -- -- -- -- -- 57 -- -- -- -- -- -- -- --
    60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 6f
    70: -- -- -- -- -- -- -- --

     

    root@OpenWrt:/# i2cdump -y 0 0x6f
    No size specified (using byte-data access)
    0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
    00: 05 00 00 01 01 01 01 80 00 00 00 00 00 01 01 01 ?..?????.....???
    10: 01 00 00 00 01 01 01 01 00 00 00 00 00 00 00 00 ?...????........
    20: 54 11 8a 61 00 21 00 49 a2 00 85 00 93 02 49 0a T??a.!.I?.?.??I?
    30: 0a c2 b7 a2 03 e0 05 00 a0 da 98 40 28 a0 18 20 ???????.???@(??
    40: 88 60 8a 14 22 2e 08 00 42 45 ce 30 54 4c c2 4e ?`??".?.BE?0TL?N
    50: a6 25 08 60 82 93 05 81 82 30 28 a0 01 c4 06 b0 ?%?`?????0(?????
    60: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XXXXXXXXXXXXXXXX
    70: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XXXXXXXXXXXXXXXX
    80: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XXXXXXXXXXXXXXXX
    90: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XXXXXXXXXXXXXXXX
    a0: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XXXXXXXXXXXXXXXX
    b0: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XXXXXXXXXXXXXXXX
    c0: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XXXXXXXXXXXXXXXX
    d0: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XXXXXXXXXXXXXXXX
    e0: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XXXXXXXXXXXXXXXX
    f0: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XXXXXXXXXXXXXXXX

     
    To be continued.
     

    15. 01. 2014.

    Linux and OSX

    Yesterday I tweeted this:

    A long time ago I was a full-time Linux user and occasional Windows user. Apparently now I'm a full-time Mac user and occasional Linux user.

    The tweet and syndicated Facebook post about, in reality, pretty incosequential thing, got more responses than some of the more serious stuff I try to put out there occasionaly. A lot of my friends commented that I've gone to the dark side and asked what I prefered more.

    Funny thing is, I don't think of it as a switch at all. I still use both. The reason I'm mainly on OSX these days is that I use Mac hardware, which beat alternatives (in same price and performance class) at the time when I was last shopping for a laptop.

    Here's what software I used daily while Linux was my primary desktop system:

    Here's what I use now:

    Not much difference, is it? Actually, even the wallpaper I use is the same.

    Of course there are differences. But they seem pretty minor. OSX feels just like another flavor of Unix. Even the system interface is very similar, due to the fact that GNOME, which I was using on Linux, is heavily influenced by OSX interface.

    Honestly, the biggest difference I found were the keyboard shortcuts. While I could've remapped most of them to Linux version, I decided to learn the native ones. The only thing I did remap is the Mac HR keyboard layout to conform to PC HR layout.

    So which one do I prefer?

    I like how well the integration between OSX and Mac hardware is done. This is certainly Apple's strengths. I like Lenovo, especially the X series, and Linux is supported very well, but Mac level of hardware+software fusion can only be done by someone who controls the entire stack.

    The other thing that OSX does well is the OS polish. The results of hundreds of well-paid people to obsess over tiny details show. This is something that open source can hardly match (at least for now). Canonical could've done it but they blew it. LinuxMint and ElementaryOS come close but they're a handful of hobbysts.

    What about Linux? More flexible. More supported hardware. Way better development environment. Massive choice of environments (or anything, really): you could never do a Crunchbang on top of OSX.

    Which is better? Who cares! I love the fact that I have two awesome desktop operating systems I can choose from.

    11. 01. 2014.

    Load balancing Redis and MySQL with HAproxy

    It's a common occurrence to have two and more load balancers as HA frontends to databases at high traffic sites. I've used the open-source HAproxy like this, and have seen others use it. Building this infrastructure and getting the traffic distributed evenly is not really the topic I'd like to write about, but what happens after you do.

    Using HAproxy like this in front of replicated database backends is tricky, a flap on one part of the network can make one or more frontends activate the backup backends. Then you have a form of split-brain scenario on your hands with updates occurring simultaneously to all masters in a replicated set. Redis doesn't do multi-master replication and it's easier to get in trouble, with just one HA frontend, if it happens the old primaries are reactivated before you synced them with new ones.

    One way to avoid this problem is building smarter infrastructure. Offloading health checks and role directing to an independent arbiter. But having one makes it a single point of failure, having more makes it another replicated nightmare to solve. I was never keen on this approach because solving it reliably is an engineering challenge each time, and I have the good sense of knowing when it can be done better by smarter people.

    Last year I've been pestering HAproxy developers to implement cheap features as a start. Let's say if a fail-over to backup happens to keep the old primary permanently offline with a new special directive, which would be more reliable than gaming health check counters. Request was of course denied, they are not in it to write hacks. They always felt the agents are the best approach, and that the Loadbalancer.org associates might even come up with a common 'protocol' for health and director agents.

    But developers heard my case, and I presume others who discussed the same infrastructure. HAproxy 1.5 which is about to be released as the new stable branch (source: mailing list) implements peering. Peering with the help of stick-tables, whose other improvements will bring many advancements to handling bad and unwanted traffic, but that's another topic (see HAproxy blog).

    Peering synchronizes server entries in stick-tables between many HAproxy instances over TCP connections, and a backend failing health checks on one HA frontend will be removed from all. Using documentation linked above here's an example:

    peers HAPEERS
        peer fedb01 192.168.15.10:1307
        peer fedb02 192.168.15.20:1307
    
    backend users
        mode tcp
        option tcplog
        option mysql-check user haproxy
        stick-table type ip size 20k peers HAPEERS
        stick on dst
        balance roundrobin
        server mysql10 192.168.15.33:3306 maxconn 500 check port 3306 inter 2s
        server mysql12 192.168.15.34:3306 maxconn 500 check port 3306 inter 2s backup
    
    #backend uploads
    
    When talking about Redis in particular I'd like to emphasize improvements in HAproxy 1.5 health checks, which will allow us to query Redis nodes about their role directly, and fail-over only if a backend became the new master. If Redis Sentinel is enabled and the cluster elects a new master HAproxy will fail-over traffic to it transparently. Using documentation linked above here's an example:
    backend messages
        mode tcp
        option tcplog
        option tcp-check
        #tcp-check send AUTH\ foobar\r\n
        #tcp-check expect +OK
        tcp-check send PING\r\n
        tcp-check expect +PONG
        tcp-check send info\ replication\r\n
        tcp-check expect string role:master
        tcp-check send QUIT\r\n
        tcp-check expect string +OK
        server redis15 192.168.15.40:6379 maxconn 1024 check inter 1s
        server redis17 192.168.15.41:6379 maxconn 1024 check inter 1s
        server redis19 192.168.15.42:6379 maxconn 1024 check inter 1s
    

    15. 12. 2013.

    Raspberry Pi breakout board and USBee AX Pro clone

    As you know by now, few months ago I built 433 MHz control of power sockets using rc-switch. Since then, I somehow lost remote control for it so I decided to have more permanent solution than bunch of wires. This time around I also used USBee AX Pro clone to check voltage levels which meant that I first had to make it work on Linux.

    IMG_20131215_184439.jpg

    You can see final result on picture included in this post. It was mixed experience which included few surprises which might be useful to other people doing hacking with Raspberry Pi, so I decided to write this post. As I'm somewhat cheap, I didn't want to buy Adafruit Pi Cobbler but instead went with Raspberry Pi GPIO adapter board module which has additional benefit of routing 5V and 3.3V to power rails on breadboard. I already had 26 pin cable, but if you don't have one, make sure you get that also. Breakout board also have markings for wiringPi (as opposed to one of other variants of GPIO pins on pi). When breakout board arrived I noticed something interesting: it requires power pins which are aligned with holes on main board. Examine picture below to see difference.

    breadboard-power-pins-aligment.jpg

    As you can see on top 400 point breadboard, power pins are in the middle pins of main area which means that you won't be able to plug in breakout board into power rails. I was fortunate enough that another 800 point breadboard had power pins aligned with main area, but none of smaller 400 point breadboards I have are correctly aligned. Picture on seller's site does show 400 point breadboard, but adapter board shows V2.0 while mine is V2.2, but judging from picture 400 point breadboards with aligned pins do exist.

    This time around, I also received USB Oscilloscope and logic analyzer which includes two analog ports which is clone of USBee AX Pro so I decided to see signals which receiver generated using it and picture is included below.

    RF443-scope.PNG

    As you can see, I was forced to use Windows XP in virtual machine to make it work. Although there is free software support for logic analyzer part of it in sigrok developers aren't really excited with USBee AX Pro clones so support for analog channels is missing. There is code in ax branch, but I haven't had a chance to try it yet. For few months I thought that it doesn't work at all, mostly because I was trying to make it work in kvm and VirtualBox. Finally, a friend suggested to try vmware, and it worked without a problem. Since there is open source implementation of USBee SX protocol, I think it's quite possible to extend it to support analog ports on AX Pro, but I haven't had time to do this yet. Have in mind that buying USBee Test Pod clones is not something CWAV condones, and newer versions of software from their's web site will re-flash the device (original is read only) which will make it unusable. DX has a long thread with dead links to Cypress site how to fix this, but easiest solution I found so far was to go to site of another clone called ESLA201A and download fix_esla201.rar which includes everything you need to re-flash USB ID back to USBee AX Pro in single rar archive.

    09. 12. 2013.

    Bonbon i OpenWrt

     
    Bonbon omogućava vrlo pristupačan pristup Internetu. No ukoliko želite vezu podijeliti s više ljudi onda je vam treba i prijenosti wifi access point, naravno pogonjen OpenWrt-om :)
     
    Sva sreća pa nije preteško napraviti na OpenWrt-u postavke za izlaz na Internet preko UMTS usb sticka.
     
    Sve što treba je dodati novu wan konekciju za UMTS sub stick, no prije toga instalirati pakete za usb i umts podršku.
     
    opkg update
    opkg install comgt usb-modeswitch usb-modeswitch-data kmod-usb-serial-option kmod-usb-ohci kmod-usb2

     
    I nakon toga dodati wan konekciju:
     
    config interface 'wan'
    option proto '3g'
    option device '/dev/ttyUSB0'
    option service 'umts'
    option apn 'web.htgprs'
    option pincode '0000' # ovdje ide vas pin broj

     
    I to je to.
     
    ps. za Tele2 APN je internet.tele2.hr

    03. 12. 2013.

    Touch screen configuration using xinput

    When you are trying to configure touch screen on Linux machine, internet offers examples Xorg.conf configuration but without explanation were numbers in it came from. If you have different touch screen you might be out of luck or guess what to do. In this post, I will try to explain how to examine your device using evtest and try out settings using xinput without restarting X server or installing any drivers other than built-in evdev.

    microtouch.jpg

    We have a couple of 3M MicroTouch M150 touch screens which are VGA monitors (1024*768 resolution) with USB touchscreen interface which is reported as:

    dpavlin@t42:~$ lsusb -d 0596:0001
    Bus 002 Device 002: ID 0596:0001 MicroTouch Systems, Inc. Touchscreen
    
    A bit of googling later, I found out that there are two different drivers for microtouch devices, but both of them support serial devices only. Not giving up that easily I decided to see what xinput reports about it (without any additional drivers installed!):
    dpavlin@t42:~$ xinput list
    ⎡ Virtual core pointer                          id=2    [master pointer  (3)]
    ⎜   ↳ Virtual core XTEST pointer                id=4    [slave  pointer  (2)]
    ⎜   ↳ 3M 3M USB Touchscreen - EX II             id=9    [slave  pointer  (2)]
    ⎜   ↳ SynPS/2 Synaptics TouchPad                id=11   [slave  pointer  (2)]
    ⎜   ↳ TPPS/2 IBM TrackPoint                     id=12   [slave  pointer  (2)]
    ⎣ Virtual core keyboard                         id=3    [master keyboard (2)]
        ↳ Virtual core XTEST keyboard               id=5    [slave  keyboard (3)]
        ↳ Power Button                              id=6    [slave  keyboard (3)]
        ↳ Video Bus                                 id=7    [slave  keyboard (3)]
        ↳ Sleep Button                              id=8    [slave  keyboard (3)]
        ↳ AT Translated Set 2 keyboard              id=10   [slave  keyboard (3)]
        ↳ ThinkPad Extra Buttons                    id=13   [slave  keyboard (3)]
    
    This seems like a good news, but when I tried to use it, it seemed that cursor would move only in middle of screen (with X axis swapped) so I wasn't very happy about it. Examining properties of device in more detail revealed that it has property to swap axes and calibrate them, but what to write into those values?
    dpavlin@t42:~$ xinput list-props 9
    Device '3M 3M USB Touchscreen - EX II':
            Device Enabled (139):   1
            Coordinate Transformation Matrix (141): 1.000000, 0.000000, 0.000000, 0.000000, 1.000000, 0.000000, 0.000000, 0.000000, 1.000000
            Device Accel Profile (263):     0
            Device Accel Constant Deceleration (264):       1.000000
            Device Accel Adaptive Deceleration (265):       1.000000
            Device Accel Velocity Scaling (266):    10.000000
            Device Product ID (257):        1430, 1
            Device Node (258):      "/dev/input/event7"
            Evdev Axis Inversion (267):     0, 0
            Evdev Axis Calibration (268):   <no items>
            Evdev Axes Swap (269):  0
            Axis Labels (270):      "Abs X" (261), "Abs Y" (262)
            Button Labels (271):    "Button Unknown" (260), "Button Unknown" (260), "Button Unknown" (260), "Button Wheel Up" (145), "Button Wheel Down" (146)
            Evdev Middle Button Emulation (272):    0
            Evdev Middle Button Timeout (273):      50
            Evdev Third Button Emulation (274):     0
            Evdev Third Button Emulation Timeout (275):     1000
            Evdev Third Button Emulation Button (276):      3
            Evdev Third Button Emulation Threshold (277):   20
            Evdev Wheel Emulation (278):    0
            Evdev Wheel Emulation Axes (279):       0, 0, 4, 5
            Evdev Wheel Emulation Inertia (280):    10
            Evdev Wheel Emulation Timeout (281):    200
            Evdev Wheel Emulation Button (282):     4
            Evdev Drag Lock Buttons (283):  0
    
    First task was was to flip x axes to make it move left-right instead of right-left. This can be acomplised using following command:
    dpavlin@t42:~$ xinput set-prop 9 267 1 0
    
    Parameters are device id, property id, X axis swap and Y axis swap. If you don't know how many parameters property takes, just put one, try it out and if it returns errors, keep adding parameters until it suceeds.

    Next, I needed to calibrate screen to track my finger moving over surface. This is where evtest comes into play. It's low level utility which enables you to see input events before they are passwd to Xorg server. You will have to run it as root as follows:

    dpavlin@t42:~$ sudo evtest
    No device specified, trying to scan all of /dev/input/event*
    Available devices:
    /dev/input/event0:      AT Translated Set 2 keyboard
    /dev/input/event1:      Lid Switch
    /dev/input/event2:      Sleep Button
    /dev/input/event3:      Power Button
    /dev/input/event4:      ThinkPad Extra Buttons
    /dev/input/event5:      Video Bus
    /dev/input/event6:      PC Speaker
    /dev/input/event7:      3M 3M USB Touchscreen - EX II
    /dev/input/event8:      SynPS/2 Synaptics TouchPad
    /dev/input/event9:      TPPS/2 IBM TrackPoint
    Select the device event number [0-9]: 7
    Input driver version is 1.0.1
    Input device ID: bus 0x3 vendor 0x596 product 0x1 version 0x410
    Input device name: "3M 3M USB Touchscreen - EX II"
    Supported events:
      Event type 0 (EV_SYN)
      Event type 1 (EV_KEY)
        Event code 330 (BTN_TOUCH)
      Event type 3 (EV_ABS)
        Event code 0 (ABS_X)
          Value   7353
          Min        0
          Max    16384
        Event code 1 (ABS_Y)
          Value   4717
          Min        0
          Max    16384
    Properties:
    Testing ... (interrupt to exit)
    
    Immidiatly we can see minimum and maximum values for both axes and putting figer on top-left corner of screen produced (a lot of) output like this:
    Event: time 1386078786.506710, -------------- SYN_REPORT ------------
    Event: time 1386078786.510712, type 3 (EV_ABS), code 0 (ABS_X), value 13919
    Event: time 1386078786.510712, type 3 (EV_ABS), code 1 (ABS_Y), value 2782
    
    After a few touches I had coordinates which where something like this:
    14046,27222380,2986
    7994,7819
    13743,136162624,13545
    Strangly it seems that origin is top-right corner, but we shouldn't care much about it beacuse we can specify them using following command (after rounding them a bit):
    dpavlin@t42:~$ xinput set-prop 9 268 2380 14000 2800 13500
    
    Trying it out on screen proved that it now works as expected. Let's call this success and remember that current Xorg knows a lot of tricks itself (recognising USB touch devices is one of them).

    As a side note you don't really need to use evtest to get device position. Using xinput list id syntax displays you more-or-less same information, including last point which you touched on device as seen below:

    dpavlin@t42:~$ xinput list 9
    3M 3M USB Touchscreen - EX II                   id=9    [slave  pointer  (2)]
            Reporting 3 classes:
                    Class originated from: 9. Type: XIButtonClass
                    Buttons supported: 5
                    Button labels: "Button Unknown" "Button Unknown" "Button Unknown" "Button Wheel Up" "Button Wheel Down"
                    Button state:
                    Class originated from: 9. Type: XIValuatorClass
                    Detail for Valuator 0:
                      Label: Abs X
                      Range: 0.000000 - 16384.000000
                      Resolution: 0 units/m
                      Mode: absolute
                      Current value: 13889.000000
                    Class originated from: 9. Type: XIValuatorClass
                    Detail for Valuator 1:
                      Label: Abs Y
                      Range: 0.000000 - 16384.000000
                      Resolution: 0 units/m
                      Mode: absolute
                      Current value: 2832.000000
    
    However evtest will run in loop until you stop it with Ctrl+C so I find it a little bit easier to use than re-running xinput list id.

    02. 12. 2013.

    compile latest tilda and enjoy solarized terminal

    tilda_002
     
    Anybody who uses Linux desktop as workstation for prolonged times knows what difference good colour scheme can make in reducing eye strain.
    After discovering Solarized theme for me there was no going back.
     
    One of my recent discoveries was tilda, an awesome drop-down terminal. Only thing missing in tilda that would make it perfect tool was solarized theme. But fear not! Support for solarized theme is coming in upcoming tilda 1.2. If you can’t wait for official update to come out then you can compile it yourself.
     
    Install dependencies for Ubuntu:
    sudo apt-get install libgtk-3-dev libvte-2.90-dev libconfuse-dev
     
    For Fedora install these dependencies:
    sudo dnf install git automake libconfuse-devel vte3-devel gtk3-devel glib-devel gettext-devel gcc
     
    And now grab sources, configure and compile tilda:

    git clone https://github.com/lanoxx/tilda.git
    cd tilda/
    ./autogen.sh --prefix=/usr
    make --silent
    sudo make install

     
    And that is it! Enjoy.
     

    01. 12. 2013.

    Zatvorena javna uprava

    Jedna osoba se požalila kako je nema u registru birača pa sam posjetio Registar birača Republike Hrvatske kako bih joj poslao podatke nadležnog ureda u kojoj se može dobiti potrebna potvrda. Stranica je dosta neprijateljska prema korisniku, sve se učitava s JavaScriptom i ne može se nekome poslati link na stranicu s nekom informacijom (npr. link na stranicu s podacima nadležnog ureda). Ali najveće iznenađenje je tek slijedilo. Na stranici je onemogućeno selektiranje i kopiranje teksta. U uvjetima korištenja stoji da se podaci, slike i tekstovi s te stranice ne smiju kopirati niti koristiti u drugim publikacijama.


    Zar je informacija o uredu javne uprave podataka koji se ne smije kopirati, koristiti ili dijeliti?! Ista stvar je i s adresama biračkih mjesta. To su podaci koji bi trebali biti javno dostupni, koji bi se trebali moći kopirati, dijeliti, poslati nekome.


    Naravno tu je i nespretna captcha kod koje ne znaš da li je to malo slovo l ili veliko slovo I?


    Očito je da o upotrebljivim i otvorenim javnim servisima možemo samo sanjati jer dobijemo samo jedva upotrebljive poluproizvode s ogradama od captchi.

    Site info

    Planet Linux.hr is an aggregation of Linux and Open Source themed blogs written by Croatian people from the whole wide world. Blog entries aggregated on this page are owned by, and represent the opinion of the author.

    Planet Linux.hr je skup blogova sa Linux i open source tematikom koje pisu nasi ljudi u domovini i inozemstvu. Clanci sakupljeni na ovoj stranici su u vlasnistvu i predstavljaju misljenje svojih autora.

    Last time updated: 20. 08. 2017. 20:00

    Aggregated blogs:

    If you want your blog to be aggregated on this planet, contact Senko Rasic.