Sentence cluster with Kmeans algorithm

  This is unsupervised learning project in the SJSU data mining course. Students need to classify the sentences (already numerized by the SJSU course Prof) into clusters using Kmeans algorithm. Below is the performance of students in the class. Seems not so bad.   The whole project includes 3 parts: data preprocess, Bisec Kmeans and the basic Kmeans algorithm. 1. Data preprocess The clustering documents have 27673 terms in total. That is said number of features are 27673. Here is an excerpt of the sample data.  First, we will buildRead More

......
Association Rule Analysis for Criminal data

This is a SJSU course project.  We need to generate rule from Criminal data to see any criminal activity cause issue. The data It is got from the official website DataSf. The features It only has several useful features for the association rule generation. Those are Category, Descript, DayOfWeek, Data/Time, PDDistrict, Resolution and Address. Below is some sample data. The data Preprocess As some of the features in the above are similar, or not yet good higher extract, we should pre-process the data before we go into next algorithm step.Read More

......
Text Sentiment Analysis Project with LSTM, CNN

This is a supervised learning project. The Text Sentiment Analysis is combined with three parts, the text data preprocess, the text data representation, three models (CNN, KNN,LSTM). This is an SJSU course project. I use the python to implement those model and generate the result. Here is the detail. the data preprocess Here is the sample data. The left side number is the target sentiment label, the right side text is the input text. From the text sample, more likely, those data is got from tweeter-similar samples by the professorRead More

......
TensorFlow 2. Shadow CNN example for MNIST data

The practice is to understand how Tensorflow applied to shadow NN in MNIST data. The practice is from Big Data University lectures. Reference: Support_Vector_Machines.html  (Coursera Machine Learning Course) Big Data University TensorFlow course Deep Learning Concept  Using multiple process layer with non-linear algorithm to simulate brain ability;  A branch of machine learning. We will focus on shadow NN in this note. Shadow NN MNIST Example: two or three layers only. In the context of supervised learning, digits recognition in our case, the learning consists of a target/feature which is toRead More

......
TensorFlow 101C. Image Texture

This Note is for image texture explanation: Reference  https://courses.cs.washington.edu/courses/cse576/book/ch7.pdf  (computer vision) Why Texture Texture gives us information about the spatial arrangement of the colors or intensities in an image. Why? The answer is the histogram can’t fully represent/classify images. All images below are half white and half black. However, the images are different. How to recognize texture Structural approach: Texture is a set of primitive texels in some regular or repeated relationship. Statistical approach: Texture is a quantitative measure of the arrangement of intensities in a region. Statistical method Co-occurrenceRead More

......

VS

VS 2017 C Sharp and PDF Generator

 VS 2017 C# with PDF generator In a real application system, most of time, you have to generate pdf for your users. For example, in the ERP, you may create invoice pdf and sales report pdf.  This tutorial will instruct you how to create a pdf with MVC framework in VS 2017 C# platform. I will illustrate on implementation from simple pdf to complex pdf Create the PDF generate Environment: Add the ITextSharp package into your project You have to install the iTextSharp-LGPL 4.1.6 version to avoid the license issue.Read More

......

Java

Java bitCount algorithm explanation

It is not strange to have bit wise operator and Left/Right Shift in a function. But it is definitely weird to have all statements with them in a function.  Yeah, maybe you think it is not true in the real production. You are wrong then. It exists in the Java SDK.  Below is copied of Long.bitCount function. public static int bitCount(long i){          i = i – ((i  > > > 1) & 0x5555555555555555L);          i = (i & 0x3333333333333333L) + ((i  > > > 2) & 0x3333333333333333L);      Read More

......
A note on BNF Parser to verify valid number: a problem from leetcode

BNF Parser is one of the step in the compiler theory to analyze the syntax validity of number, program sentence, variable, etc. It is also a perfect recursive algorithm sample when you plan to study and implement this algorithm. I got the question from leetcode.com. It is the hardest question listed in the problem list. That attracts me to study the parser implementation for the problem. I read a compiler theory textbook also two decades ago. I have to go over the first half of a compiler theory book andRead More

......
The unit test mock and related terms (Test Double): a note and examples to martin old article

Ref: https://www.martinfowler.com/articles/mocksArentStubs.html http://xunitpatterns.com/Test%20Double.html https://github.com/kensipe/spock-mocks-nfjs http://nilhcem.com/FakeSMTP/. It is hard to not write unit test for your programmer job. Of course, you can quit if you do not like to write and your boss requests. But, anyway, it is better to write to ensure your code is safe to run, and your job is safe to stay. Most of your unit test main object, i.e. system under test (SUT), are relied on dependency object.  For example, you have Order class and Warehouse class. When you fill the order in Order.fill method, youRead More

......
A Simple Implementation of MVC Framework

Most of us uses lots of Java MVC Framework in the software development. The famous will be Spring. Maybe you are scared away to think to implement yours when you saw the huge code in the Spring framework. Actually, it is not so hard that you just implement a simple MVC application without the use of Spring framework. In the article, we will demo a simple MVC of inventory management mobile APP in text mode (not gui), which include product in, product out and inventory tracking.  We will use ChainRead More

......

Dos

DOS in Y2016 – How to Run It Smoothly in Your Clinic Lab

Tools/Reference used for the project: dosbox debug:  http://www.vogons.org/viewtopic.php?t=394 http://www.vcfed.org/forum/showthread.php?11320-NE2000-card-emulation-with-DOSBox- http://www.columbia.edu/~em36/pcltopdf.html https://sourceforge.net/projects/vjoystick/ https://en.wikipedia.org/wiki/Cyrix_6x86 OK. Let’s talk the grandpa’s OS, the Dos. You may wonder, is DOS still survived at some corners in Silicon Valley in USA, the Most Developed digital and AI-pioneer area?  The answer is “IT DID”. Here are some photos to rock you!    From left to right, bottom to top, you can see Dos, 3 ISA cards, Clinic Lab PC and 80486 CPU. Those machines are used for work endurance test for workers, who are employed widely inRead More

......

Misc

Tax Incidence formulas in the view of producer and consumer

My daughter attends her first college class, Micro economy, at the summer of her junior year. So, it is time to go over some economy concepts that I learnt before to prepare questions from daughter. Get the textbook from Prof. Jeffrey M. Perloff – Microeconomics_ Theory and Applications with Calculus (Pearson, 4th) Of course, this textbook full of math formula is different than my daughter’s economy class. But I always prefer math formula than the description. When reading the tax incidence formula in the textbook, it is different than the formulaRead More

......
How to get a good score in SAT/PSAT: a Journey from ELD to SAT 1580

It has been 2 months since the official SAT/PSAT score is published. I have two kids in high school.  Both two kids got wonderful result for their first SAT and first PSAT. I am wondering how it happens and would like to introspect so that it may help parents to prepare for it. Let’s view the score result first. My elder grader got 1580 in SAT on the test of November, 2018, and 1500 in PSAT at October 2018. My younger grader took PSAT only and got 1480. The backgroundRead More

......

PGM

A note for the eigenvector used in the Markov Steady Status: using P transpose or P to calculate eigenvector

  A note for the eigenvector used in the Markov Steady Status: using or P to calculate eigenvector? Last night, a classmate of my friend asks a good question about the eigenvector used for the Markov Steady status. Do we use Markov probability transition matrix to calculate its eigenvalue, or use its transpose to calculate? Why? Here is an example. The P matrix below is the Markov probability transition matrix: sum of each row probability is 1.  You can image the 3 nodes with transition graph as below: We can computeRead More

......
SVM Manual Maximization Procedure

Start from idiot example: 4 sample data:  +(1,0), +(2,0), -(-1,0),-(-2,0), what is the SVM boundary and support vectors? It is easy to know that the most closest positive (+) example and negative (-) example are (1,0) and (-1,0) accordingly.  It is easy to know the SVM boundary should be x=0. Ok, how can we use math analytic to get it via SVM concept, i.e. max the width between closet positive/negative samples. Set the SVM boundary is ax+by+c=0 The support vector are ax+by+c=d and is ax+by+c=-d.   (d>0) Set the twoRead More

......
Complete and Simple PCA SVD Tutorial Note

Ref:   http://setosa.io/ev/principal-component-analysis/ https://matthew-brett.github.io/teaching/pca_introduction.html https://blog.statsbot.co/singular-value-decomposition-tutorial-52c695315254 http://www.bluebit.gr/matrix-calculator/calculate.aspx PCA is the major method to reduce features/variables before you train your data in the machine learning. It uses the top K most variance transformed features to represent the original N features (assume N>>K). For example, we have food consumption of 17 types of food in grams per person per week for every country in the UK.   Maybe even after you view the above table for 5 minutes, you are hardly to get some patterns. But if you use PCA to extract theRead More

......
Likelihood.vs.Probablity

Reference  LikelyhoodFunction_The world is a complex place.pdf Example Likelihood:  when an event......

Junction Tree local consistency and global consistency

This note is to describe the Junction Tree local consistency and global consistency (text book:  P109, Example 6.1) Reference: pgm_Princeton_COS513 Foundations of Probabilistic Modelinglecture7.pdf gouws_python_2010: a master thesis on how to implement graphical model with python Text: Bayesian Reasoning and Machine Learning Junction tree property(JTP): For each pair U, V of cliques with intersection S, all cliques on the path between U and V contain S. (from gouws_python_2010.pdf, a master thesis on how to implement graphical model with python) Example 1 to reflect the property Add Separators in diagram b), you may findRead More

......

DB

Docker Container deployment with multi layers

That is a SJSU lecture course project. You need to deploy a list of service via Docker approach. That is not difficult if just one or two. However, if it is a list of Docker containers with several layers, you will feel headache. Anyway, you need to finish it to avoid score F in the course. The request deploy diagram The explanation )1st layer, the Front End GUI Layer with Node.Js app )2nd layer, API Gateway to dispatch call to the right service with Kong GateWay )3rd layer, EC2 ELB Application Layer toRead More

......
Note on Fowler Nosql Distilled and SJSU CMPE 281 Course

for Cloud Technology course – 2018 Fall The Fowler’s “NoSql Distilled” book is a good tutorial for anyone want to step into the new “polyglot persistence” world.  Before this course, I only focus on the relationship database (RDB), such as mssql, mysql, oracle, etc.  Actually, I did not attend the course lectures in the class. Instead, I follow the syllabus and the labs to finish the course study. I combine the book note and the course lab note here to make a brief tutorial for nosql database. The Book DistilledRead More

......